Current Status
All Systems Operational
Components
Recent Incidents
Users experiencing issues accessing multiple Atlassian products
criticalMay 14, 2026 · resolved May 14
### Summary On May 14, 2026, between 04:30 and 05:26 UTC, Atlassian customers experienced widespread service disruption across multiple Atlassian Cloud products. The issue was caused by a race condition in our internal deployment orchestration platform during a routine rollback operation of a core identity service in the us-east region. This race condition resulted in insufficient capacity for the identity service in the affected region which started returning errors to dependent products. The incident was detected within a minute by automated monitoring systems and mitigated in 56 minutes. ### **IMPACT** During the incident, customers attempting to access Atlassian Cloud products in the us-east region experienced authentication and permission failures and were unable to access services. Customers also experienced errors when accessing the support portal until Atlassian fell back to an alternate support method. This was caused by a core identity service in the us-east region becoming unavailable. Affected products included Atlassian Administration, Atlassian Analytics, Bitbucket, Compass, Confluence, Jira, Jira Product Discovery, Jira Service Management and Trello. Some users outside us-east may have been affected in certain scenarios. ### **ROOT CAUSE** The incident was caused by a race condition in our internal deployment orchestration platform during a routine rollback operation of a core identity service in the us-east region. This race condition resulted in insufficient capacity for the identity service in the affected region which started returning errors to dependent products. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. Atlassian is prioritizing the following actions to help prevent similar incidents in future: * **Refine deployment orchestration safeguards** * Harden our deployment platform to prevent similar race conditions or resulting capacity loss during a rollback operation. * Streamline mitigation steps when a service becomes unavailable in a region. * **Reduce cross-region impact** * Improve regional isolation and fallback handling so an issue affecting a single region is less likely to impact customers or product functionality in other regions. We recognise how critical reliable access to Atlassian products is for our customers' productivity, and we apologize to customers who were impacted by this incident. Thanks, Atlassian
Multiple Atlassian services are experiencing issues
minorMay 8, 2026 · resolved May 8
All dates and times below are in UTC unless stated otherwise. ### Summary On May 8, 2026 between 00:22 and 06:08, one of our hosting providers suffered a significant incident in a specific availability zone in prod-east which led to Atlassian customers experiencing degraded performance and delays of background operations and automation execution. The incident started on May 8, 2026 at 00:22 and was detected within 4 minutes by automated monitoring systems. Our teams worked to restore core access by 06:08. Final cleanup of backlogged processes and minor issues progressed in stages from there was completed iteratively by 19:15. ### **IMPACT** The primary infrastructure affected in this incident was the event processing pipeline in the prod-east region, which distributes events between Atlassian services and underpins background operations such as automation execution, search indexing, notifications, permission synchronisation. * Between 00:22 and 06:08, an infrastructure incident in our hosting provider triggered an ingestion failure in our event processing pipeline. * At 02:50, event ingestion was failed over to an unaffected availability zone, progressively restoring live event flows. * At 06:08, reliability for new ingestion in prod-east recovered to 100%. The remaining work was to drain the accumulated cross-region backlog of messages, which completed by 17:00. * By 18:48, Automation had processed their backlog of events that were created while processing **Automation** Between 00:22 and 02:50, customers with automation rules triggered by events originating from the prod-east region experienced a significant reduction in rule executions. During this window, event-triggered automation rules were not firing because the events that trigger them were not being delivered. Rule authoring, saving, and rules triggered manually, by schedules, or by webhooks were not affected. At 02:50, the event processing infrastructure failed over to an unaffected availability zone, restoring delivery of live events to Automation and allowing new event-triggered rules to begin executing normally. However, events generated during the impact window still needed to be replayed before delayed automations could be processed. Beginning at 08:28, upstream services replayed their queued events in a coordinated sequence, and all replayed events were processed by 18:48. During the replay window, customers may have experienced automation rules executing later than expected, a small number of rules reaching daily processing limits due to compressed replay, and time-sensitive rules not completing as expected if internal timeout thresholds were exceeded. **Jira and Jira Service Management** Between 00:22 and 02:50, customers with tenants hosted in the prod-east region experienced disruption to Jira and Jira Service Management event-driven features like automation, along with a short period of elevated errors during infrastructure failover. Core Jira experiences, including issue view, boards, and project navigation, remained available throughout the incident. Jira event delivery was affected by the primary impact, preventing downstream services from receiving issue lifecycle events. This affected automation rules triggered by Jira events, AI agent orchestration in Jira, notifications for issue updates and transitions, search indexing for newly created or modified issues, and event-driven integrations between Jira and other Atlassian products. At 02:50, the event processing infrastructure failed over to an unaffected availability zone, restoring delivery of new events. All events generated during the impact window were retained in a recovery queue and required replaying. This began at 08:28 and completed at 12:00. During the replay window, customers may have experienced automation rules executing later than expected, delayed notifications arriving hours after the triggering action, temporary gaps in search results for content created or modified during the impact window, and AI agent workflows not completing as expected where internal timeout thresholds were exceeded. **Confluence** Between 00:22 and 02:50, customers with tenants hosted in the prod-east region experienced disruptions to event-driven services in Confluence. This resulted in delays to search indexing, notifications, automation rule execution, and permission synchronisation. The underlying event processing infrastructure failed over to an unaffected availability zone, after which live Confluence operations resumed normally. However, events generated during the impact window were queued for replay, and some background services remained delayed until that replay and related validation work completed. Between 10:14 and 17:00, a bulk replay of all the queued tenant replay tasks was completed to restore data consistency. During and immediately after the replay window, customers may have experienced search results not reflecting content created or modified during the outage, delayed or missing notifications for page and comment activity, automation rules firing later than expected, and brief delays in permission synchronisation for tenants relying on incremental identity sync. **Bitbucket and Pipelines** Between 00:22 and 06:08, customers using Bitbucket and Pipelines experienced failures and degraded functionality across event-driven workflows. Core Git operations, including push, pull, and clone, were not affected and continued to operate normally throughout the incident. Automatic pipeline triggers initiated by push or pull request events were unavailable during the impact window. Merge queues, custom merge checks, Forge-based triggers, workspace permission changes, and some workspace provisioning flows were also affected. Customers using merge queues were unable to merge pull requests, and some pipeline steps failed because queued work contributed to elevated concurrency limits. At approximately 03:57, Pipelines was reconfigured to consume events through an alternative path, restoring automatic pipeline triggering. Merge queues, custom merge checks, Forge triggers, and other affected workflows were progressively restored as the underlying event processing infrastructure recovered. All Bitbucket and Pipelines services were confirmed fully operational by 06:08. After recovery, queued events were reviewed and replayed where safe to restore data consistency for billing, audit logging, and other background processes. **Identity Services** Between 00:22 and 02:50, customers with tenants hosted in the prod-east region experienced delays in the propagation of identity and group membership changes to downstream Atlassian products. Core identity operations, including authentication, login, and direct group management actions, were not affected and continued to function normally throughout the incident. The impact was limited to asynchronous, event-driven operations that depend on the event processing pipeline. This included delays in delivering group membership and user profile changes to products such as Jira and Confluence, which affected downstream permission synchronisation and crowd sync flows. A small number of SCIM-based identity synchronisation and site provisioning workflows also experienced temporary delays. After the event processing infrastructure recovered, backed-up identity and group directory events were replayed where required, restoring downstream consistency for affected products. No identity data was lost. Group membership changes, user profile updates, and provisioning-related events that occurred during the impact window were retained and processed after recovery. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know outages impact your productivity. While our monitoring and recovery processes helped us respond quickly, this incident highlighted opportunities to further strengthen resilience for event-driven services. We are prioritizing improvements that will: * **Enhance failover coverage** so critical event processing can recover more smoothly during infrastructure disruptions. * **Strengthen recovery handling** so replayed events can be processed more quickly. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability. Thanks, Atlassian Customer Support.
Users experiencing issues with login across Atlassian products
minorApr 13, 2026 · resolved Apr 13
### Summary On April 13, 2026, between 05:49 and 06:29 UTC, customers experienced failures when attempting to log in, sign up, reset passwords, and complete multi-factor authentication flows across Atlassian cloud products. Approximately 90% of authentication requests failed during the peak impact window, affecting users in the US East and EU regions. The incident was mitigated within 40 minutes through manual intervention, and full service was restored by 06:29 UTC. ### **IMPACT** * **Duration**: ~40 minutes \(05:49–06:29 UTC, April 13, 2026\) * **Affected regions**: US East and EU \(authentication infrastructure serves EU traffic from US East, with traffic primarily from EU at this time of day\). * **Affected products**: All Atlassian cloud products requiring authentication, including Jira, Confluence, Jira Service Management, and Trello. * **Customer experience**: Users attempting to log in, sign up, reset passwords, or complete MFA flows received errors. Users already logged in with active sessions were unaffected. ### **ROOT CAUSE** This incident had several contributing factors that combined to produce a failure that the system could not recover from without manual intervention. **The primary cause** was a recently enabled change that caused our authentication infrastructure to retry requests to a downstream identity service when those requests were slow to respond. This retry behaviour was rolled out to 100% of traffic earlier the same day. Under normal conditions this would be benign, but it meant that any slowness in the downstream service was amplified. Since multiple upstream services were also independently retrying their own failed requests, the amplification compounded further into a retry storm. **The trigger** was a burst of legitimate user traffic. A pattern of many parallel link preview requests for a single user caused a concentrated load spike on a downstream identity service, pushing its response times above the retry threshold. On its own, this kind of spike had occurred many times before and always recovered. With the retry amplification now in effect, the spike instead created a runaway feedback loop: slow responses caused retries, retries increased load, increased load caused slower responses, preventing recovery. The incident was mitigated by manually scaling up the downstream identity service to provide sufficient capacity to absorb the amplified load. Once scaled, the service recovered immediately, bringing authentication error rates to zero within one minute. **REMEDIAL ACTIONS PLAN & NEXT STEPS** We are taking the following actions designed to prevent recurrence and improve our resilience: 1. **Immediate**: The retry-on-timeout change has been disabled. 2. **Load shedding and self-healing**: We are adding load shedding capabilities to our authentication services so that they can automatically shed excess load and self-recover during traffic spikes, without requiring action before automatic scaling starts. 3. **Reducing request fan-out**: We are reviewing patterns where a single user action can generate many parallel downstream requests, and will introduce methods where possible to reduce the amplification potential. We apologize to customers whose services were interrupted by this incident and we are taking immediate steps to improve the platform’s reliability. Thanks, Atlassian Customer Support
Multiple products impacted by search failures
criticalApr 8, 2026 · resolved Apr 8
### Summary On April 8, 2026, between 04:46 UTC and 12:09 UTC, search functionality was unavailable or degraded across several Atlassian Cloud products, including Jira, Confluence, Jira Service Management, Rovo, Rovo Dev, Loom, Guard Standard, Customer Service Management and Atlassian Administration. A configuration change increased the resources reserved for a core system component that runs on nodes in our compute platform. On a subset of clusters configured for high‑density workloads, the increased reservations exceeded available node capacity interrupting search and related experiences for affected customers. The root cause was identified and a rollback was merged at 05:42 UTC with some systems seeing recovery by 07:33 UTC**.** Core search functionality was restored approximately by 08:55 UTC, and full downstream recovery completed by 12:09 UTC. ### **IMPACT** During the impact period, some customers experienced outages or degradation in search across Jira, Confluence, Jira Service Management, Rovo, Rovo Dev, Loom, Guard Standard, Customer Service Management and Atlassian Administration. Other experiences that rely on search such as quick find, navigation, AI assistants, dashboards, were also intermittently affected during this period. Impacted customers may have been unable to find pages or recordings and experienced degraded performance in finding issues; received empty or delayed search results; or experienced AI assistants and dashboards that could not retrieve relevant context. **Jira, Jira Service Management and Customer Service Management:** Search and experiences that depend on search like finding issues and agent responses in CSM remained available but with degraded performance in fallback mode. By 12:09 UTC, search indexes and search performance was fully restored from fallback to full capacity across all regions. **Guard Standard and Atlassian Administration:** Search functionality was unavailable for parts of the incident window. As a result, Domain Claims, usage tracking, and managed accounts were degraded for portions of the window. These services were restored to operational status by 07:33 UTC. Guard Premium was not impacted by this issue. **Confluence:** Search functionality was unavailable for parts of the incident window. Recovery began at 07:30 UTC as backend search clusters were restored. Full recovery, including search index replay, completed at 11:37 UTC. **Loom:** Search functionality and some experiences that rely on Confluence Search, such as sharing to spaces\) was unavailable for portions of the window and fully restored at 11:37 UTC. **Rovo and Rovo Dev:** Rovo agents remained responsive but experienced degraded functionality due to loss of search capabilities in underlying services. They were unable to reliably return context about work items or pages. Functionality was fully restored at 11:37 UTC. ### **ROOT CAUSE** Atlassian products rely on OpenSearch clusters to power their search capabilities including issue search, content search, and AI-powered search features. An infrastructure configuration change increased resource reservations \(CPU & Memory\) for a system component that runs across our compute platform. On a subset of clusters configured for high-density workloads, the increased reservations exceeded available node capacity. This caused search workloads to be evicted and, in some clusters, could not reschedule onto any available nodes impacting search functionality across affected products. The change was deployed across multiple production clusters in a short time frame, limiting the opportunity to detect the capacity conflict in a smaller subset of clusters before it reached the wider fleet. Automated scaling systems attempted to recover by provisioning additional capacity but in the worst‑affected clusters this led to runaway node scaling and exhaustion of available network resources, prolonging recovery time. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We understand that service disruptions impact your productivity. In addition to our existing testing and preventative processes, Atlassian is prioritizing the following actions to help reduce the likelihood and impact of similar incidents in the future and to speed up recovery when issues occur: * **Enforce smaller deployment cohorts and larger soak for critical platform changes for these cluster types** Implement smaller deployment cohorts, mandatory soak periods between environments, and automated health gates so that changes are validated on a limited set of clusters before being promoted more broadly. * **Strengthen automated pre‑deploy validation for resource changes** Add validation checks to ensure resource changes for system components are compatible with node capacity and reserved headroom, preventing system workloads from crowding out customer workloads. * **Improve post‑deploy verification and alerting** Enhance monitoring and post‑deployment verification to detect patterns such as spikes in pending pods, runaway node scaling, and low pod‑IP headroom closely correlated with new configuration being rolled out. * **Align autoscaling behavior with capacity and safety limits** Align autoscaling capacity calculations with node reservations and introduce safeguards and circuit breakers to prevent runaway scaling and to enforce safe limits on node and pod IP counts. * **Enhance recovery automation** Improve automation and runbooks so we can safely disable autoscaling, remove empty nodes in bulk, and restore normal operations faster across multiple clusters in parallel. We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability and to reduce the risk and impact of similar issues in future. Thanks, Atlassian Customer Support
Groups are not clickable from Groups Page in Atlassian Administration
minorFeb 18, 2026 · resolved Feb 18
We have implemented the fix and the group page in the Admin Hub is now functioning as expected.
Get alerted when Guard goes down
Alert24 monitors Guard and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.




