Mixpanel logo

Mixpanel Status Page

Analytics & Data · monitored by Alert24

All Systems Operational

Current Status

All Systems Operational

View Mixpanel status page ↗

Components

Application Availability (US)
Operational
Ingestion API Availability (US)
Operational
Application Availability (EU)
Operational
Ingestion API Availability (EU)
Operational
Application Availability (IN)
Operational
Ingestion API Availability (IN)
Operational
Data Export
Operational
Warehouse Connectors
Operational
JavaScript Library CDN
Operational

Recent Incidents

Board Access Issues

none

May 19, 2026 · resolved May 19

This incident has been resolved.

Query API degraded performance

minor

May 14, 2026 · resolved May 16

# Mixpanel RCA: Transient Data Access Issue, May 14, 2026 ## Summary On Thursday, May 14, 2026 at approximately 2:30 PM PT, a routine but infrequent cleanup operation in Mixpanel's storage system mistakenly removed a portion of production data files in addition to the unused files it was intended to remove. Some customers experienced query errors during the hours that followed. We detected the issue within minutes, deployed mitigations the same evening that returned query success rates and latency to normal, and restored the affected files from backup by 5:15 PM PT on Friday, May 15. Mixpanel's ingestion pipeline was not affected and no event data was lost in transit. ## What happened This incident was triggered by a storage cleanup procedure that runs periodically to remove files no longer referenced by Mixpanel's metadata. The procedure was more involved than usual: it followed a recent enhancement to our file storage strategy that left a set of unused files behind in our storage backend, and addressing them required extending our standard cleanup approach to cover a new code path. As part of executing this extended cleanup, an engineer generated the list of files to delete using a SQL query whose date filter was not strictly earlier than the reference snapshot it was being compared against. As a result, a small set of legitimate production files that had been written in the gap window between the snapshot and the filter date were incorrectly classified as unused and removed. The deletion ran for roughly half an hour before internal alerting caught the resulting query failures and the operation was stopped. The trigger was operator error against an ambiguous runbook, not a defect in the live serving path or in our ingestion pipeline. ## Customer impact Impact unfolded in two phases. The first phase ran from Thursday at approximately 2:30 PM PT until 8:11 PM PT — roughly five and a half hours. During this window, customers across the platform may have seen slower or failed queries when their requests touched files that had been deleted. The breadth and severity varied by project depending on which data each query touched. By 8:11 PM PT, mitigations had fully rolled out — queries automatically retried against an alternate availability zone, and a fallback path was put in place to serve missing files from a backup datastore. After this point, query success rate and latency returned to normal. The second phase lasted from 8:11 PM PT Thursday through approximately 5:15 PM PT Friday, May 15. During this window, fewer than 2% of customers were still affected — specifically, those whose deleted files had not yet been fully restored from backup. The vast majority of these files were recovered by Friday afternoon. A small number of projects \(under 30\) had files that could not be fully recovered from backup, and we are following up with those accounts directly. ## Timeline \(Pacific Time\) * May 14, 2:30 PM — Cleanup operation begins * May 14, 3:11 PM — Internal alerting flags query failures; the cleanup operation is stopped within minutes * May 14, 4:07 PM — Status page banner posted * May 14, 4:45 PM — Mitigation deployed: queries automatically retry against an alternate availability zone * May 14, 7:12 PM — Mitigation deployed: queries fall back to a backup datastore for missing files * May 14, 8:11 PM — Query success rate and latency fully restored to normal levels * May 15, 5:15 PM — File restore from backup complete; status page banner resolved ## Why this happened Several contributing factors lined up. The runbook for this cleanup procedure had ambiguous wording around the ordering and timing of its inputs. It had been recently authored to handle the new file-storage code path and had not gone through a formal review before being used. Our cleanup tooling did not programmatically enforce the safety invariant that the date filter must be strictly before the reference snapshot. That invariant lived only in operator-authored SQL. The extended cleanup was being executed in parallel across two storage layers by two different engineers, which increased the room for error. ## What we're doing to prevent recurrence We have already made or have actively in flight the following changes. We are adding programmatic safeguards to our cleanup tooling so that an input set whose date filter is not safely before the reference snapshot is rejected before any deletion occurs, along with a reconciliation step that flags any production-referenced file before deletion proceeds. Destructive cleanup operations will now run in phased stages, starting with internal projects and pausing for a holding period before any broader execution. Destructive storage operations now require a second engineer to sign off on the exact deletion set and to be present during execution, matching the practice we already follow for database migrations. We have updated the cleanup runbook with explicit guidance on input timing, required safety buffers, and an enforced review process for any runbook covering a destructive operation. Longer term, we are working to eliminate the manual portion of this cleanup procedure entirely and route it through our existing automated cleanup infrastructure, so the class of failure that produced this incident is no longer reachable through human input. ## Closing Reliability and data integrity are foundational to the trust our customers place in Mixpanel, and we recognize the impact this incident had on the teams who rely on us. We are sorry for the disruption. If you have questions about how this incident may have affected a specific project, please reach out to your account team or Mixpanel Support.

Data Volume Monitoring degraded performance

none

May 13, 2026 · resolved May 14

This incident has been resolved.

Data Volume Monitoring degraded performance

none

May 12, 2026 · resolved May 12

This incident has been resolved.

Snowflake pipeline exports degraded

none

May 6, 2026 · resolved May 6

This incident has been resolved.

Get alerted when Mixpanel goes down

Alert24 monitors Mixpanel and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.

Start free — no credit card

More Analytics & Data status pages