Current Status
All Systems Operational
Components
Recent Incidents
Siteimprove Platform Login Errors (SSO Users)
majorMay 6, 2026 · resolved May 6
**Executive Summary:** On May 6, 2026, an infrastructure failure in our Kubernetes environment caused SSO login failures for a subset of users for approximately 42 minutes. All other platform functionality was unaffected. The issue was caused by a loss of quorum in an internal cluster's control plane, compounded by a node that had been silently degraded. The incident was resolved by replacing the affected nodes, and the system has been operating normally since. **Incident Overview:** The issue originated from an internal infrastructure cluster that acts as a connectivity bridge between our environments. Two of the cluster's three control plane nodes became non-functional — one had been in a degraded state, and a second failed during normal operations. With two of three nodes down, the cluster could no longer coordinate its workloads, which disrupted the network path required for SSO logins for some users. Some of our internal tooling was also affected. To resolve the issue, the team replaced the unresponsive nodes, which restored normal operations. Login functionality recovered shortly after. **Impact:** Some users logging into the platform via SSO experienced failures from approximately **2026-05-06T14:34Z** to **2026-05-06T15:16Z** \(~42 minutes\). **Detection:** The incident was detected at **2026-05-06T14:34Z** when our automated monitoring system identified connectivity failures in the login journey, which alerted the operations team. **Response:** Our operations team began investigating immediately. The affected control plane nodes were identified and replaced, with full recovery confirmed via automated tests at **2026-05-06T15:16Z**. A status page update was posted during the incident. **Root Cause:** Two of three control plane nodes in an internal cluster became unavailable at the same time — one had been silently degraded, and a second failed independently. The monitoring pipeline that should have detected the degraded node ahead of time was itself not functioning correctly. Because it only logged on failure and not on success, its silence was indistinguishable from normal operation, and the team had no way to know it had stopped working. Follow-up actions updated the monitoring pipeline to emit success signals so that their absence can be alerted on, improving overall control plane monitoring coverage, and introducing scheduled control plane node replacements as a preventive measure.
Siteimprove Platform Login Errors
majorMay 1, 2026 · resolved May 1
**Executive Summary:** On May 1, 2026, users on our US platform experienced login failures and slowness for approximately 2.5 hours. The issue was caused by an internal service generating an excessive number of simultaneous requests to a shared backend component, which became overloaded and caused a cascading failure in the login process. The issue was resolved the same day, and the system has been operating normally since. **Incident Overview:** The issue originated from one of our internal data-processing services, which attempted to process a large volume of data for a single account all at once, rather than in manageable batches. This created an unexpectedly high load on a shared backend service that other parts of the platform depend on, including the component responsible for verifying user access during login. As the shared service became overwhelmed, the login-related service was unable to complete its checks and began failing repeatedly. This meant users trying to log in received errors or experienced significant delays. To resolve the issue, the team identified the service generating the excessive load and temporarily disabled it, since it wasn’t a real-time component. This immediately relieved the pressure on the shared backend, and the login process recovered shortly after. The service was subsequently restored with a fix applied preventing the same pattern from recurring. **Impact:** Users on the US platform experienced login failures or significant slowness from approximately **2026-05-01T14:46Z** to **2026-05-01T17:24Z** \(~2 hours 38 minutes\). The platform itself remained generally available for users who were already logged in. A status page update was posted during the incident. **Detection:** The incident was detected at **2026-05-01T14:46Z** by our automatic monitoring system, which alerted the operations team to login-related failures in the US environment. **Response:** Our operations and engineering teams began investigating immediately. Initial steps focused on scaling up the affected backend components and stabilizing the login service. Once the source of the excessive load was identified, the responsible service was temporarily disabled at **2026-05-01T17:06Z**, which resolved the issue. Login errors subsided within minutes, and the status page was updated to reflect recovery at **2026-05-01T17:24Z**. **Root Cause:** An internal data-processing service attempted to handle all data for an unusually large dataset simultaneously, without any limit on the number of concurrent operations. This generated a surge of requests that overwhelmed a shared backend service, which in turn caused the login verification process to fail. The service has since been updated to limit concurrency and prevent this pattern from recurring.
Degraded Performance - Crawler, Linkchecker, DCI Scores
minorApr 29, 2026 · resolved Apr 30
The issue causing delays with Siteimprove’s crawler and link checker processing has been resolved. Scan processing has returned to normal, and Digital Certainty Index (DCI) scores and related metrics are now updating as expected.
Degraded Performance and Login Issues
minorMar 16, 2026 · resolved Mar 16
The issue affecting logins and various modules throughout the Siteimprove platform is all sorted. We’re sorry for holding you up! If you have any additional questions or feedback, please submit a new support ticket through our Help Center ("?" button) located at the top right-hand-side of the Siteimprove platform.
Degraded Performance - Crawler
minorMar 3, 2026 · resolved Mar 10
A summary of this incident has been generated and posted here: [https://help.siteimprove.com/support/solutions/articles/80001199295-faq-incident-response-feb-28th-2026-crawler-degraded-performance](https://help.siteimprove.com/support/solutions/articles/80001199295-faq-incident-response-feb-28th-2026-crawler-degraded-performance)
Get alerted when Siteimprove goes down
Alert24 monitors Siteimprove and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.



