Document Drafter logo

Document Drafter Status Page

Legal Tech · monitored by Alert24

All Systems Operational

Current Status

All Systems Operational

View Document Drafter status page ↗

Components

REST API
Operational
API Services - EU
Operational
API Services - EU 2
Operational
API Services - US East
Operational
API Services - Canada Central
Operational
API Services - Switzerland North
Operational
Document Drafter Portal
Operational

Recent Incidents

Degraded Performance Due to Azure Issue

minor

Oct 29, 2025 · resolved Oct 30

We successfully mitigated the impact of the Azure Front Door \(AFD\) incident that affected several Azure services from October 29 to October 30, 2025. Our immediate action was to **manually move our services away from Azure Front Door/Content Delivery Network \(CDN\)**, which restored the availability of the Document Drafter application. While our application remained available and customer data integrity was maintained, we are taking further steps to enhance resilience. We are currently **updating our disaster recovery plans** to formalize failover strategies for all external Azure components, and we are **reviewing our use of Microsoft's Content Delivery Network** to ensure robust availability. We are also tracking Microsoft's planned repairs, which include implementing extended bake times, removing asynchronous processing, and enhancing testing for their configuration rollout pipeline. Please find Microsoft’s post mortem below.

Service interruption

minor

Oct 9, 2025 · resolved Oct 9

We have received the following preliminary post mortem from Microsoft. ‌ We have received the following preliminary post mortem from Microsoft. COMMUNICATION: _Join one of our upcoming 'Azure Incident Retrospective' livestreams discussing this incident \(to hear from our engineering leaders, and to get any questions answered by our experts\) or watch a recording of the livestream \(available later, on YouTube\):_ [_https://aka.ms/AIR/QNBQ-5W8_](https://aka.ms/AIR/QNBQ-5W8)   **What happened?** Between 07:50 UTC and 16:00 UTC on 09 October 2025, Microsoft services and Azure customers leveraging Azure Front Door \(AFD\) and Azure Content Delivery Network \(CDN\) may have experienced increased latency and/or timeouts – primarily across Africa and Europe, as well as Asia Pacific and the Middle East. This impacted the availability of the Azure Portal as well as other management portals across Microsoft. Peak failure rates for AFD reached approximately 17% in Africa, 6% in Europe, and 2.7% in Asia Pacific and the Middle East. Availability was restored by 12:50 UTC, though some customers continued to experience elevated latency. Latency returned to baseline levels by 16:00 UTC, at which point the incident was mitigated.   **What do we know so far?** AFD routes traffic using globally distributed edge sites and supports Microsoft services including the management portals. The AFD control plane generates system metadata that the data plane consumes for customer-initiated ‘create’, ‘update’, or ‘delete’ operations on AFD or CDN profiles. One of the trigger conditions for this incident was a software defect in the latest version of the AFD control plane which had been rolled out six weeks prior to the incident, in line with our safe deployment practices. Newly created customer tenant profiles were being onboarded to the newer control plane version. Our service monitoring detected elevated data plane crashes due to a previously unknown bug – triggered by erroneous metadata, generated by a particular sequence of profile update operations. Our automated protection layer intercepted this in early update stages and prevented this metadata from propagating any further to the data plane, thereby averting any customer impact at that time. In addition, as the newer control plane was running in tandem with the previous version of the control plane, we disabled the new control plane from taking any requests.   On 09 October 2025, we initiated a cleanup of the affected tenant configuration with the erroneous metadata. Since the automated protection system was blocking the impacted customer tenant profile updates in the initial stage, we temporarily bypassed it to allow the cleanup of the tenant configuration to proceed. By bypassing the protection system, the erroneous metadata was inadvertently able to propagate to later stages – and triggered the bug in the data plane that crashed the data plane service. This resulted in a disruption to a significant number of edge sites across Europe and Africa, approximately 26% of AFD data plane infrastructure resources in these regions were impacted.   As part of AFD mechanisms to manage traffic, load was automatically distributed to nearby edge sites \(including in Asia Pacific and the Middle East\). Additionally, as regional business hours traffic started ramping up, it added to the overall traffic load. The increased volume of traffic on the remaining healthy edge sites resulted in high resource utilization, which exceeded operational thresholds. This triggered an additional layer of protection which started distributing traffic to a broader set of edge sites globally, to reduce further impact. Recovery required a combination of automated restarts, manual intervention where automated restarts were taking too long, and traffic failover operations for impacted management portals. Full mitigation was achieved once edge site infrastructure resources stabilized and latency returned to normal.   Additionally, initial customer notifications were delayed primarily due to challenges determining impact, while attempting to target communications to those impacted. We have automated communications to notify customers of incidents quickly, unfortunately this capability was not yet supported in this incident scenario.   **How did we respond?** * 07:30 UTC on 09 October 2025 – The cleanup operation was initiated. * 07:50 UTC on 09 October 2025 – Initial customer impact began, and increased over the next 90 minutes. * 08:13 UTC on 09 October 2025 – Our telemetry detected resource availability loss across multiple AFD edge sites. We began investigating as impact continued to grow. * 09:04 UTC on 09 October 2025 – We identified that the crashes were due to the previously identified data plane bug. * 09:08 UTC on 09 October 2025 – Automated restarts began for our AFD infrastructure resources, and manual intervention began for resources that did not recover automatically. * 09:15 UTC on 09 October 2025 – Customer impact had grown to be at its peak. * 10:01 UTC on 09 October 2025 – Communications were published to the Azure Status page. * 10:45 UTC on 09 October 2025 – Targeted customer communications were sent to Azure Service Health. * 11:59 UTC on 09 October 2025 – Management portals, like the Azure Portal, performed failover operations \(including using scripts to update the load balancing configuration, to split traffic between multiple routes\) helping restore its service availability. * 12:50 UTC on 09 October 2025 – Availability for AFD fully recovered, however a subset of customers may still have been experiencing elevated latency. * 16:00 UTC on 09 October 2025 – After continuous monitoring of latency improvement, we declared the incident as mitigated after confirming recovery. **How are we making incidents like this less likely or less impactful?** * We have hardened our standard operating procedures, to ensure that the configuration protection system is not bypassed for any operation. \(Completed\) * We have fixed the control plane defect which generated the erroneous tenant metadata that led to the data plane resource crashes. \(Completed\) * We have fixed the bug in the data plane. \(Completed\) * We will expand the automated customer alerts sent via Azure Service Health, to include similar classes of service degradation. \(Estimated completion: November 2025\) * We are making improvements to our Azure Portal failover systems from AFD, to be more robust and automated. \(Estimated completion: December 2025\) * We are building additional runtime configuration validation pipelines against a replica of real-time data plane, as a pre-validation step prior to applying them broadly. \(Estimated completion: March 2026\) * We are improving data plane resource instance recovery time, following any impact to the data plane. \(Estimated completion: March 2026\)   **How can customers make incidents like this less impactful?** * Consider implementing failover strategies with Azure Traffic Manager, to fail over from Azure Front Door to your origins: [https://learn.microsoft.com/azure/architecture/guide/networking/global-web-applications/overview](https://learn.microsoft.com/azure/architecture/guide/networking/global-web-applications/overview) * Consider reviewing our best practices for Azure Front Door architecture: [https://learn.microsoft.com/azure/well-architected/service-guides/azure-front-door](https://learn.microsoft.com/azure/well-architected/service-guides/azure-front-door) * Consider implementing retry patterns with exponential backoff, to improve workload resiliency: [https://learn.microsoft.com/azure/architecture/patterns/retry](https://learn.microsoft.com/azure/architecture/patterns/retry) * More generally, consider evaluating the reliability of your applications using guidance from the Azure Well-Architected Framework and its interactive Well-Architected Review: [https://aka.ms/AzPIR/WAF](https://aka.ms/AzPIR/WAF) * The impact times above represent the full incident duration, so are not specific to any individual customer. Actual impact to service availability varied between customers and resources – for guidance on implementing monitoring to understand granular impact: [https://aka.ms/AzPIR/Monitoring](https://aka.ms/AzPIR/Monitoring) * Finally, consider ensuring that the right people in your organization will be notified about any future service issues – by configuring Azure Service Health alerts. These can trigger emails, SMS, push notifications, webhooks, and more: [https://aka.ms/AzPIR/Alerts](https://aka.ms/AzPIR/Alerts) **How can we make our incident communications more useful?** You can rate this PIR and provide any feedback using our quick 3-question survey: [http://aka.ms/AzPIR/QNBQ-5W8](http://aka.ms/AzPIR/QNBQ-5W8)

Service interruption

critical

Apr 1, 2025 · resolved Apr 1

Microsoft has advised that the issue is now fully resolved. Between 08:51 and 10:15 UTC on 01 April 2025, Microsoft identified customer impact resulting from a power event in the North Europe region which impacted Storage, CosmosDB, and various other resources which are not used in Document Drafter. Microsoft has confirmed that all affected services have now recovered. A power maintenance event led to temporary power loss in a single datacenter, in Physical Availability Zone 2, in the North Europe region affecting multiple racks and devices. Microsoft has advised that power has been fully restored and services are seeing full recovery. We will now evaluate internally as per our SOPs. We will not provide more information unless you request it.

Get alerted when Document Drafter goes down

Alert24 monitors Document Drafter and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.

Start free — no credit card

More Legal Tech status pages