Current Status
Partially Degraded Service
Components
Recent Incidents
Partially Degraded Performance [US region]
majorJan 21, 2026 · resolved Jan 21
**Incident Date:** 2026-01-21 **Impact:** System degradation and intermittent downtime. **Primary Cause:** Infrastructure **resource exhaustion** triggered by an unprecedented high-volume traffic surge. ## 1. Summary On January 21, an unprecedented surge in traffic, peaking at **450,000 requests per minute \(~56.25x baseline\)**. While application servers autoscaled successfully, the Core Database became the bottleneck. Despite two manual vertical scaling interventions, the system experienced two periods of degradation before stabilizing as database capacity finally matched the demand. ## 2. Root Cause The root cause of the incident was **infrastructure resource exhaustion** resulting from insufficient database overhead to accommodate a sudden traffic spike. * **Traffic Volume:** An unprecedented surge in external demand drove platform traffic significantly beyond predicted growth, increasing from a baseline of **8,000 req/min** to a peak of **450,000 req/min \(a ~56.25x increase\)**. * **Scaling Operation Time:** Vertical scaling of the Core Database required a **10–30 minute operation time** per event. During these intervals, the system remained degraded as incoming demand outpaced both available capacity and recovery speed. ## 3. Optimizations & Corrective Actions Based on the investigation, we will implement the following technical safeguards: #### **A. Transition Impacted Queries to Secondary Nodes** * **Action:** Reconfigure remaining database queries to target Secondary \(Read\) Replicas rather than the Primary node. * **Goal:** Offload significant pressure from the Primary database. By reducing the load on the Primary node, we ensure it retains enough resource overhead to improve the scaling and recovery time. This prevents the Primary from being choked by contention, allowing it to complete vertical scaling operations much faster during a surge. #### **B. Optimize Autoscaling Performance \(Server & Database\)** * **Action:** Review and tune autoscaling policies for both the App Tier and Database Tier to specifically reduce operation time. * **Goal:** Decrease the "Time-to-Ready" for new resources. By optimizing scaling triggers and resource warm-up procedures, we ensure capacity is provisioned more rapidly, improving the system's overall recovery time during a sudden spike.
Partially Degraded Performance [SG region]
majorDec 30, 2024 · resolved Dec 30
This incident has been resolved.
Partially Degraded Performance [SG region]
majorDec 12, 2024 · resolved Dec 12
This incident has been resolved.
Partially Degraded Performance [SG region]
majorDec 11, 2024 · resolved Dec 11
This incident has been resolved.
Partially Degraded Performance [SG region]
majorNov 1, 2024 · resolved Nov 1
This incident has been resolved.
Get alerted when Social Plus goes down
Alert24 monitors Social Plus and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.





