Current Status
All Systems Operational
Components
Recent Incidents
Kameleoon Main Application Tools Temporarily Unavailable
majorMay 27, 2026 · resolved May 27
This incident has been resolved.
Campaign and Audience Reporting Temporarily Unavailable
minorMay 20, 2026 · resolved May 20
This incident has been resolved.
Experiment Results Processing Delays
minorMay 8, 2026 · resolved May 8
This incident has been resolved.
Campaign and Audience Reporting Temporarily Unavailable
majorMay 7, 2026 · resolved May 7
**Summary** Between April 28th and May 10th, 2026, Kameleoon experienced instability affecting the analytics platform used to display experiment results and reporting data. During this period: * Some customers experienced delays or unavailability when accessing experiment results. * In several cases, parts of the platform became temporarily unavailable. * The issue primarily impacted analytics-related services backed by our ClickHouse infrastructure. The incident has been mitigated through an infrastructure capacity upgrade, and platform stability has been restored. **Customer Impact** Affected customers may have experienced: * Missing or delayed experiment reporting data * Timeouts while loading analytics dashboards * Intermittent platform errors * Temporary inability to access parts of the Kameleoon platform No data loss was identified during the incident. **Timeline** Starting April 28th * Increased instability observed on the analytics platform. * Multiple ClickHouse nodes experienced excessive memory consumption. * Some nodes became unavailable and failed to restart correctly. During Incident Investigation * Engineering teams identified unusually heavy analytical queries consuming abnormal amounts of RAM on the ClickHouse cluster. * In several situations, memory exhaustion triggered the Linux OOM \(Out-Of-Memory\) killer, repeatedly terminating ClickHouse processes immediately after restart attempts. Mitigation Actions * Emergency remediation procedures were applied to stabilize the cluster. * The analytics infrastructure capacity was increased. * Total memory available to ClickHouse nodes was doubled across the cluster. Post-Upgrade * Platform stability returned to normal. * No recurrence of the issue has been observed since the infrastructure upgrade. **Root Cause Analysis** Investigations are still ongoing. At this stage, we believe the instability was triggered by a recent product change \(new feature introduction or platform improvement\) that generated significantly heavier analytical queries than anticipated. These queries caused abnormal memory pressure on the ClickHouse cluster, eventually leading to repeated node failures and service instability. While the exact triggering change has not yet been conclusively identified, current evidence strongly suggests a correlation with recently deployed analytics-related changes. **Resolution** The incident was resolved by increasing the capacity of the analytics infrastructure: * Memory allocation for all ClickHouse nodes was doubled. * Cluster stability was restored immediately following the upgrade. * Since the upgrade, no additional instability has been detected. **Preventive & Remediation Measures** To reduce the likelihood and impact of similar incidents in the future, we have initiated the following actions: Immediate Actions * Increased monitoring on ClickHouse memory consumption and query behavior * Enhanced alerting for abnormal resource usage patterns * Continued investigation to identify the exact triggering query patterns and originating product changes Planned Improvements * Introduce stricter safeguards on high-memory analytical queries * Improve query profiling and performance validation during feature rollouts * Add additional automatic protections to prevent cluster-wide memory exhaustion * Review deployment validation procedures for analytics-impacting changes * Define emergency operational procedures for faster recovery in case of similar events Ongoing Monitoring We will continue to closely monitor the platform over the coming weeks to ensure long-term stability and validate the effectiveness of the remediation measures. **Conclusion** We understand the importance of reliable analytics and experiment reporting for our customers and sincerely apologize for the disruption caused during this incident. Our teams remain actively engaged in the investigation and implementation of additional safeguards to ensure the continued stability and resilience of the platform.
Experiment Results Processing Delays (Germany Cluster)
minorMay 6, 2026 · resolved May 6
This incident has been resolved.
Get alerted when Kameleoon goes down
Alert24 monitors Kameleoon and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.
