Social Plus logo

Social Plus Status Page

Cloud Providers & Hosting · monitored by Alert24

Minor Incident

Current Status

Partially Degraded Service

View Social Plus status page ↗

Components

Core Services (SG)
Operational
AWS: Account
Operational
AWS: Cloudwatch (ap-southeast-1)
Operational
AWS: Cloudwatch (eu-central-1)
Operational
AWS: Cloudwatch (us-east-1)
Operational
Cloudflare: Pages
Degraded
GCP: Pub/Sub
Operational
SendGrid: API
Operational
Social+ Portal
Operational
Core Services (EU)
Operational
AWS: Cloudfront
Operational
AWS: Dynamodb (ap-southeast-1)
Operational
AWS: Dynamodb (eu-central-1)
Operational
AWS: Dynamodb (us-east-1)
Operational
Cloudflare: Workers
Operational
GCP: App Engine
Operational
SendGrid: API v2
Operational
Core Services (US)
Operational
AWS: EC2 (ap-southeast-1)
Operational
AWS: EC2 (eu-central-1)
Operational

Recent Incidents

Partially Degraded Performance [US region]

major

Jan 21, 2026 · resolved Jan 21

**Incident Date:** 2026-01-21 **Impact:** System degradation and intermittent downtime. **Primary Cause:** Infrastructure **resource exhaustion** triggered by an unprecedented high-volume traffic surge. ## 1. Summary On January 21, an unprecedented surge in traffic, peaking at **450,000 requests per minute \(~56.25x baseline\)**. While application servers autoscaled successfully, the Core Database became the bottleneck. Despite two manual vertical scaling interventions, the system experienced two periods of degradation before stabilizing as database capacity finally matched the demand. ## 2. Root Cause The root cause of the incident was **infrastructure resource exhaustion** resulting from insufficient database overhead to accommodate a sudden traffic spike. * **Traffic Volume:** An unprecedented surge in external demand drove platform traffic significantly beyond predicted growth, increasing from a baseline of **8,000 req/min** to a peak of **450,000 req/min \(a ~56.25x increase\)**. * **Scaling Operation Time:** Vertical scaling of the Core Database required a **10–30 minute operation time** per event. During these intervals, the system remained degraded as incoming demand outpaced both available capacity and recovery speed. ## 3. Optimizations & Corrective Actions Based on the investigation, we will implement the following technical safeguards: #### **A. Transition Impacted Queries to Secondary Nodes** * **Action:** Reconfigure remaining database queries to target Secondary \(Read\) Replicas rather than the Primary node. * **Goal:** Offload significant pressure from the Primary database. By reducing the load on the Primary node, we ensure it retains enough resource overhead to improve the scaling and recovery time. This prevents the Primary from being choked by contention, allowing it to complete vertical scaling operations much faster during a surge. #### **B. Optimize Autoscaling Performance \(Server & Database\)** * **Action:** Review and tune autoscaling policies for both the App Tier and Database Tier to specifically reduce operation time. * **Goal:** Decrease the "Time-to-Ready" for new resources. By optimizing scaling triggers and resource warm-up procedures, we ensure capacity is provisioned more rapidly, improving the system's overall recovery time during a sudden spike.

Partially Degraded Performance [SG region]

major

Dec 30, 2024 · resolved Dec 30

This incident has been resolved.

Partially Degraded Performance [SG region]

major

Dec 12, 2024 · resolved Dec 12

This incident has been resolved.

Partially Degraded Performance [SG region]

major

Dec 11, 2024 · resolved Dec 11

This incident has been resolved.

Partially Degraded Performance [SG region]

major

Nov 1, 2024 · resolved Nov 1

This incident has been resolved.

Get alerted when Social Plus goes down

Alert24 monitors Social Plus and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.

Start free — no credit card

More Cloud Providers & Hosting status pages