10/20/25 - AWS Outage

Jeff Olliver

Modified on: Mon, 20 Oct, 2025 at 11:37 AM

Incident Summary – Virginia Region Impact

This article will cover:

Timeline of Events
Impact on Services
Current Status and Recovery

Timeline of Events

The Virginia region incident began with intermittent tracking alarms on October 19 at 10:00 PM PDT. These brief alerts, each lasting under one minute, occurred sporadically at 15-35 minute intervals and initially self-resolved. The situation escalated when AWS reported a major outage affecting hundreds of services in the Virginia region at 12:11 AM PDT on October 20.

Impact on Services

At 07:07 AM PDT on October 20, CAKE tracking systems in Virginia reported full degraded health, triggering an immediate response from our auto-healing DNS subsystem. The system automatically rerouted all traffic to healthy regions, completing the failover within seconds. While tracking traffic resumed successfully, reporting services (UI and APIs) experienced impacts due to failures in primary-region data sources, specifically Redshift.

The CAKE Ops team responded at 08:46 AM PDT by shifting reporting workloads to secondary regions. This action restored functionality for affected clients, though initial performance was slower than expected due to lack of cached data in those regions.

Current Status and Recovery

To monitor the ongoing situation, follow these steps:

Check AWS Service Health Dashboard for Virginia region updates
Monitor your CAKE platform performance metrics
Contact CAKE Support if you experience any service degradation

Note: Performance continues to normalize organically as caches warm through usage. Some users may still observe suboptimal speeds due to cross-region latency and increased load on secondary infrastructure. For clients hosted with primary services in Virginia, CAKE services continue to operate in secondary regions.

FAQ

When will services return to normal in the Virginia region?

We will communicate a resumption of CAKE's services in Virginia once both AWS and CAKE teams have confirmed full restoration of behavior.

Is my tracking affected?

All client tracking traffic has been successfully rerouted and is functioning normally through healthy regions.

Why are reports loading slowly?

Secondary regions are building their data caches, which causes temporary performance impacts that improve over time with continued usage.

Incident Summary – Virginia Region Impact

Timeline of Events

Impact on Services

Current Status and Recovery

FAQ

You may also be interested in:

Related Articles