Current Status
All Systems Operational
Components
Recent Incidents
AI Essay Grading Service Maintenance
minorMay 21, 2026 · resolved May 25
**Executive Summary** Brillium recently experienced a temporary service disruption affecting the AI Essay Grading capability within Assessment Builder. This disruption occurred during a required upgrade to our underlying AI model after the previous model was deprecated by our cloud provider. The issue has been fully resolved, services have been restored, and there was no impact to customer data. We take full responsibility for the disruption and have implemented additional safeguards to strengthen our change management and service reliability processes. **Impact Overview** * **Service Affected:** AI Essay Grading \(Assessment Builder\) * **Customer Impact:** Temporary unavailability of automated grading * **Workaround Provided:** Temporary routing to manual grading * **Data Integrity:** No loss, corruption, or unauthorized access to customer data All candidate submissions and assessment results remained secure, complete, and unchanged throughout the incident. **Root Cause** The disruption was the result of an external dependency change. The Large Language Model \(LLM\) previously used by Brillium for AI Essay Grading was deprecated by AWS, requiring an accelerated transition to a supported next-generation model. During this transition: * Automated grading was temporarily paused to ensure a controlled and secure upgrade * Additional validation was required to confirm grading accuracy, consistency, and fairness * Safeguards were applied to protect processing integrity and prevent inaccurate scoring While this action was necessary to maintain platform stability and vendor support compliance, it introduced a temporary service interruption while validation and stabilization were completed. **Resolution** Our engineering team has successfully: * Transitioned the AI Essay Grading engine to a supported next-generation LLM * Completed comprehensive regression testing and quality assurance * Validated grading precision, consistency, and fairness across all scenarios * Fully restored automated scoring for all assessments Manual grading processes introduced during the incident have now been discontinued. **Current Status** * AI Essay Grading is fully operational * All assessments are processing normally * No further action is required **What We Are Doing to Prevent Recurrence** We are strengthening our platform and processes to reduce the likelihood and impact of similar events: * **Enhanced Third-Party Dependency Monitoring** Improved tracking of vendor lifecycle changes, including deprecations and end-of-support timelines * **Proactive Upgrade Planning** Earlier adoption cycles to reduce the need for accelerated transitions * **Expanded Pre-Deployment Testing** Increased simulation of production scenarios prior to release **Customer Assurance** We recognize how critical reliable scoring is to your operations. This upgrade ensures: * Improved grading accuracy and consistency * Greater long-term system stability and vendor alignment * Continued protection of all customer data Brillium remains committed to delivering secure, high-performance, and future-ready assessment solutions. **Support** If you have any questions or need assistance, our support team is here to help.
Performance Issues for some customers
majorMar 26, 2026 · resolved Mar 26
Select customers experienced a brief service interruption due to abnormal external activity. The team addressed the issue promptly, and normal service was restored. We are continuing to strengthen our monitoring and protective controls to further improve platform resilience.
Brillium platform upgrade performance challenges
majorMar 10, 2026 · resolved Mar 12
# **Brillium Platform Upgrade Incident – Post Mortem** **Incident Summary:** On March 10, 2026, Brillium experienced an unplanned service disruption during a scheduled platform upgrade. While Brillium upgrades are typically completed without customer impact and undergo extensive testing, this release resulted in unexpected performance issues that temporarily affected platform availability for our customers. Our engineering team responded immediately, worked continuously to restore service, and progressively brought customers back online while stabilizing platform performance. The platform is now fully available and operating as expected for all customers. **Customer Impact:** During the incident, customers may have experienced: * Temporary inability to access the Brillium platform * Slower performance while services were being restored * Intermittent access during stabilization efforts We recognize the importance of platform availability to our customers’ assessment and testing operations and regret the disruption this incident caused. **Root Cause \(High Level\):** This release introduced new features that required updates to the platform’s underlying data structure, including initializing certain data values as part of the upgrade process. While the upgrade was aggressively tested prior to deployment, the testing did not fully account for the impact of these changes on customers with very large data volumes. Under those conditions, the upgrade process placed significantly more load on the system than anticipated, which led to degraded performance and ultimately caused platform instability. **Resolution:** To resolve the issue, Brillium’s engineering team implemented targeted adjustments to support the upgrade process at scale, restored customer access in controlled phases, and increased monitoring to ensure platform stability. Once all customers were back online, the platform continued to be closely monitored to confirm normal operation and performance. **Preventative Measures and Improvements:** As a result of this incident, Brillium is implementing the following improvements: * Enhancements to pre upgrade testing to better simulate customers with large data volumes * Additional safeguards during upgrades that involve data structure changes * Improved monitoring and validation steps throughout the upgrade process These actions are intended to further reduce the risk of future upgrade related service disruptions. **Closing Statement:** We sincerely apologize for the disruption caused by this incident and appreciate the patience shown by our customers throughout the restoration process. Over the past 24 months, Brillium has successfully delivered more than 28 platform upgrades without service interruption through our zero downtime delivery process, which is designed to minimize customer impact. While we take extensive measures to ensure reliable and seamless upgrades, unplanned issues can still occur in complex technology environments. We take this incident seriously and are applying the lessons learned to further strengthen our upgrade and testing processes. Brillium remains fully committed to providing a stable, secure, and continuously improving platform, and we thank you for your continued trust.
Access to Assessment Builder Endpoints
majorDec 10, 2025 · resolved Dec 11
During the investigation, the system operations team discovered that Assessment Builder was unable to establish a connection to the database systems. As a result, assessment authoring and delivery were impacted for approximately 1 hour and 15 minutes. No impact was observed on the Administration, Customer Management, or Talent applications. This issue appears to have been within the internal systems and the issue has been resolved.
AWS Service Disruption Affecting Brilium Services
majorOct 20, 2025 · resolved Oct 20
# **AWS Regional Outage \(October 20, 2025\)** ## **1. Executive Summary** On October 20, 2025, Brillium services experienced a significant disruption lasting approximately 4.5 hours due to an external, widespread regional outage within Amazon Web Services \(AWS\). The incident began at 3:00 AM EST and primarily impacted service availability and performance for customers relying on the affected AWS region. The core issue was external to Brillium’s platform. Our focus during the incident was on confirmation, communication, and swift restoration, which was completed by 7:30 AM EST after AWS reported their upstream resolution. ## **2. Key Details** | Metric | Detail | | --- | --- | | **Incident Name** | AWS Regional Service Disruption | | **Date** | October 20, 2025 | | **Duration** | 4 hours, 30 minutes \(3:00 AM EST to 7:30 AM EST\) | | **Impacted Services** | All core Brillium services \(including API, Data Processing, and Web Frontend\) | | **Root Cause** | Widespread regional outage in AWS \(External\) | | **Resolution Status** | Fully Resolved | ## **3. Impact Analysis** During the incident window \(3:00 AM - 6:00 AM EST\), customers experienced: * **Service Unavailability:** Difficulty accessing or connecting to various Brilium applications. * **Performance Degradation:** Increased latency and intermittent timeouts when services were partially available. * **Data Processing Delays:** Backend processing queues were backed up, leading to delays in scheduled tasks and data updates. The primary customer impact was loss of service availability for the duration of the upstream AWS outage. ## **4. Root Cause** The root cause was confirmed to be a major service disruption impacting a critical AWS region upon which a portion of Brillium’s infrastructure relies. This was an external failure of the cloud provider’s infrastructure. * **Brillium Action:** The incident was immediately confirmed via AWS status pages and internal monitoring systems. * **External Cause:** An initial AWS failure \(e.g., networking or power event\) cascaded across availability zones within the region. ## **5. Incident Timeline \(All Times EST\)** | Time | Event | | --- | --- | | **3:00 AM** | Internal monitoring alerts triggered across multiple Brillium services. Incident declared. | | **3:15 AM** | External AWS status page confirms a major regional incident affecting multiple services. | | **3:30 AM** | Initial customer status update posted to [status.brilium.com](http://status.brilium.com) identifying the external AWS issue. | | **6:00 AM** | AWS reports resolution of the underlying issue, and Brilium systems begin self-recovering. | | **6:15 AM** | Service restoration update posted; Brillium enters extensive monitoring phase. | | **6:30 AM** | Brillium monitoring confirms all services are stable, running within normal parameters, and fully functional. Final resolution update posted. | ## **6. Corrective Actions and Lessons Learned** While the root cause was external, we identified opportunities to improve our monitoring and response to similar external events: | Area | Action Item | Target Date | | --- | --- | --- | | | | **Alerting** | Enhance specific alerting thresholds to differentiate between high load/internal issues and sudden, widespread external availability failures. | End of Q2 2026 | | **Communication** | Create pre-drafted status page templates for common external dependency failures \(e.g., AWS, other third-party providers\) to expedite initial communication. | Immediate | | **Monitoring** | Implement synthetic transactions \(probes\) in a secondary, unaffected region to quickly confirm global service health during local regional outages. | Q4 2025 | We appreciate the patience of our customers during this disruption and are committed to implementing these actions to enhance the resilience of the Brillium platform.
Get alerted when Brillium goes down
Alert24 monitors Brillium and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.





