Current Status
All Systems Operational
Components
Recent Incidents
Users being redirected to last opened email
noneNov 14, 2024 · resolved Nov 14
This incident has been resolved.
Sporadic Dyspatch Application Outages
majorSep 11, 2024 · resolved Sep 11
This incident has been resolved.
Application and API unavailable
majorApr 8, 2024 · resolved Apr 9
# **Post Mortem** - **April 8 2024 Dyspatch Outage Intro** On April 8, 2024, Dyspatch was unavailable between the hours of 12:30PM and 01:00AM Pacific time due to an issue that occurred during a routine upgrade of Dyspatch's infrastructure. This post mortem aims to analyze the root causes of the outage, assess its impact on our services, and outline steps Dyspatch is taking to prevent similar incidents in the future. ## **Timeline \(Pacific Time\)** **11:35 -** We begin the upgrade **12:10 -** The production cluster intermittently returns 503s for users. Dyspatch's services cannot communicate with each other. **12:17 -** We attempt to rollback the changes. **12:30 -** We identify the problem: the internal authentication mechanism our services use to communicate securely is out of sync across services. **12:30 - 17:30 -** We try several strategies to bring production online. **17:30** - To avoid further impact to our production environment, work begins on our staging environment. **18:17 -** We identify that previous changes were made to our staging environment without getting applied to our production environment. **21:16 -** Staging is online. We begin applying the changes from our staging environment to our production environment. **00:56 -** Dyspatch is available again. ## **Why did this happen? What did we learn?** During the outage we ran into several challenges trying to restore service. We discovered that a previous update to a critical component of our infrastructure was applied only to our staging environment. It was quickly determined that the issue was an authentication misalignment between Dyspatch's services which meant that our various services could not communicate with each other. We learned that we did not have a way to generate new credentials without taking the services that manage our cluster offline. After we determined that critical services had to be taken offline we switched to testing on our staging environment to prevent data loss in our production environment. Ultimately a difference in our production and staging environment had knock-on effects affecting our ability to rollback and recover quickly. ## **What are we doing about it?** There are several actions we intend to take to prevent similar issues from happening: 1. We immediately aligned our staging and production environments to ensure that any infrastructure testing done in staging will be the same when applied to our production environment. The root cause of this outage came from a difference in environments and this ensures that we can be confident when testing required infrastructure changes. 2. We plan to invest in tooling to help us automatically catch and audit any drift between our environments. Catching the difference beforehand would have prevented this incident. 3. We are investing in tooling and processes to help us rebuild our cluster more reliably and quickly. We had to spend time migrating changes from our staging environment to our production environment when trying to restore Dyspatch. ## **Summary** Finally, we want to apologize. We know Dyspatch is important for supporting our customers' communications. Your patience and support mean a great deal to us and we appreciate everyone who reached out to our team. Like with any operational issue, we will spend time in the coming days and weeks to understand the details of the event and make improvements mentioned above to our infrastructure and processes.
Mobile preview display issues
noneNov 30, 2023 · resolved Nov 30
Mobile previews in the email builder were not working. Desktop previews and email editing in general were unaffected. A fix has been implemented and deployed.
Sporadic Dyspatch Application Outages
majorNov 24, 2023 · resolved Nov 24
This incident has been resolved.
Get alerted when Dyspatch goes down
Alert24 monitors Dyspatch and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.







