The Difference Between Good and Bad Incident Communication

When GitHub went down on December 24, 2025, their status page showed no incident for the first several minutes. Users found out through Twitter. That gap between reality and the status page erodes trust faster than the outage itself.

Learning how to write incident updates is a core operational skill. Good updates reduce support tickets by 40-60%, maintain customer confidence, and turn outages into trust-building moments. Bad updates, or worse, silence, do the opposite.

The Four States of an Incident Update

Every incident follows a lifecycle. Your updates should map to these four states.

Investigating

This is your first public acknowledgment. Post it within 5 minutes of detecting an issue, even if you don't know the cause yet.

Template: "We are investigating reports of degraded performance affecting [component]. Some users may experience [specific symptom: slow load times, failed API calls, login errors]. Our team is actively looking into this. Next update in 30 minutes."

What to include:

Which component is affected
What users are experiencing (not what's broken internally)
When the next update will come

What NOT to include:

Speculation about the root cause
Promises about resolution time
Technical jargon your customers won't understand

Identified

You know what's causing the problem. Share that information clearly.

Template: "The issue has been identified. [Brief, non-technical explanation of the cause]. Our engineering team is implementing a fix. [Component] remains [degraded/partially available]. We expect to deploy a fix within [timeframe]. Next update in [time]."

Good example: "The issue has been identified as a configuration error in our payment processing service. Checkout is currently unavailable. Our team is rolling back the change and expects to restore service within 45 minutes."

Bad example: "We found a null pointer exception in the PaymentGatewayService.processTransaction() method caused by a missing environment variable in the k8s deployment manifest."

Your customers don't need stack traces. They need to know what's broken and when it will be fixed.

Monitoring

The fix is deployed. You're watching to make sure it holds.

Template: "A fix has been deployed for the [component] issue. We are monitoring service performance to confirm the resolution. Early indicators show [positive signal: response times returning to normal, error rates dropping]. We will confirm full resolution within [timeframe]."

This state is important because it manages expectations. Deploying a fix doesn't mean the incident is over. Premature "resolved" updates followed by a return of the issue destroy credibility.

Resolved

The incident is confirmed fixed. This is also your opportunity to set expectations about follow-up.

Template: "This incident has been resolved. [Component] is operating normally. The issue lasted approximately [duration] and affected [scope: all users, users in EU region, users on mobile]. We will publish a postmortem with more details within 48 hours."

Always include the duration and scope. This gives affected users closure and unaffected users confidence.

Tone Guidelines

Be Direct, Not Defensive

Good: "Our database experienced a connection pool exhaustion, causing API timeouts for approximately 12 minutes."

Bad: "Due to unprecedented traffic patterns that exceeded our normal operational parameters, some users may have noticed intermittent connectivity issues."

The second example uses passive voice, hedging language, and minimizes the impact. Customers see through this immediately.

Acknowledge Impact Honestly

Good: "Checkout was completely unavailable for 23 minutes. Orders placed during this window were not processed. If you attempted a purchase, please try again."

Bad: "Some users may have experienced issues with certain transactions."

Vague language insults your customers' intelligence. They know the site was down. Tell them you know it too.

Use Confident, Calm Language

Avoid words that signal panic or uncertainty:

"We think" (use "We have identified" or "We are investigating")
"Hopefully" (use "We expect" with a specific timeframe)
"As soon as possible" (give a concrete timeline, even if approximate)

Timing Rules

First update: Within 5 minutes of detection. Even "We are aware of an issue and investigating" is better than silence.

During investigation: Every 30 minutes until the cause is identified. If nothing has changed, post anyway: "Investigation ongoing. Our team has ruled out [X] and is focusing on [Y]. Next update in 30 minutes."

After identification: Every 30-60 minutes depending on severity.

After resolution: Post within 15 minutes of confirming the fix.

The worst thing you can do is go silent. A status page with a 2-hour-old "investigating" update looks abandoned.

Who Should Write Updates

Assign a dedicated incident communicator. This should not be the person debugging the issue. Engineers focused on fixing the problem shouldn't also be writing customer-facing prose.

The communicator's job is to:

Translate technical details into customer-friendly language
Post updates on schedule
Coordinate with support on incoming ticket volume
Draft the resolution and postmortem

Tools like alert24.net, Instatus, and Better Stack all support team roles so you can separate the "fix it" team from the "communicate it" team.

One Last Rule

Every incident update should answer three questions:

What is happening right now?
What are we doing about it?
When will we update you next?

If your update answers all three, your customers will wait patiently. If it answers none of them, they'll be on the phone with your support team, or worse, evaluating your competitors.

How to Write Incident Updates That Build Customer Trust