Incident Summary – Virginia Region Impact
This article will cover:
Timeline of Events
The Virginia region incident began with intermittent tracking alarms on October 19 at 10:00 PM PDT. These brief alerts, each lasting under one minute, occurred sporadically at 15-35 minute intervals and initially self-resolved. The situation escalated when AWS reported a major outage affecting hundreds of services in the Virginia region at 12:11 AM PDT on October 20.
Impact on Services
At 07:07 AM PDT on October 20, CAKE tracking systems in Virginia reported full degraded health, triggering an immediate response from our auto-healing DNS subsystem. The system automatically rerouted all traffic to healthy regions, completing the failover within seconds. While tracking traffic resumed successfully, reporting services (UI and APIs) experienced impacts due to failures in primary-region data sources, specifically Redshift.
The CAKE Ops team responded at 08:46 AM PDT by shifting reporting workloads to secondary regions. This action restored functionality for affected clients, though initial performance was slower than expected due to lack of cached data in those regions.
Current Status and Recovery
To monitor the ongoing situation, follow these steps:
- Check AWS Service Health Dashboard for Virginia region updates
- Monitor your CAKE platform performance metrics
- Contact CAKE Support if you experience any service degradation
FAQ
When will services return to normal in the Virginia region?
We will communicate a resumption of CAKE's services in Virginia once both AWS and CAKE teams have confirmed full restoration of behavior.
Is my tracking affected?
All client tracking traffic has been successfully rerouted and is functioning normally through healthy regions.
Why are reports loading slowly?
Secondary regions are building their data caches, which causes temporary performance impacts that improve over time with continued usage.