Postmortem -
Read details
Mar 13, 14:24 MDT
Resolved -
Update: Normal operations restored. We will post details from our Root Cause Analysis before End of Day today.
Mar 13, 10:16 MDT
Update -
Update: We have deployed backend changes and are seeing the expected improvements. We are monitoring as the system recovers and queued work continues to drain.
Estimated time for delayed processing to fully clear: ~35 minutes. We will provide further updates as needed.
Mar 13, 10:01 MDT
Monitoring -
Update: We see application performance at normal levels and are monitoring the final processing of backend queues.
Mar 13, 09:15 MDT
Update -
Update: We have addressed the root cause and see performance on the application returning to normal. We are mitigating the delays in backend queue processing and will provide another update in 15 minutes.
Mar 13, 08:56 MDT
Identified -
Update: We have identified the root cause. We are remediating and mitigating impact.
Mar 13, 08:25 MDT
Update -
Update: We continue to see abnormal pressure and database utilization. As morning traffic scales up, we are seeing additional disruptions as the system takes on more load.
While we investigate the root cause, we are also pursuing mitigations in parallel to restore normal operations. We will continue to provide updates.
Mar 13, 08:10 MDT
Update -
We are currently experiencing disruptions to logins and page loads. This is caused by high utilization on a core database, which is creating back pressure across our distributed systems.
Our team is actively investigating the root cause and evaluating mitigation strategies. We will provide updates as we learn more.
Mar 13, 07:49 MDT
Investigating -
We are observing increased latency and error rates across the web application. We are actively investigating.
Mar 13, 07:00 MDT