Incidents & On-Call

Incident Management

Overview

Incidents in Alert24 represent events where one or more of your services are experiencing problems. The incident management system helps you track the issue from detection through resolution while keeping your customers informed via your status page.

Creating an Incident

Automatic Incident Creation

When monitoring checks detect a failure and the failure threshold is met, Alert24 can automatically create an incident linked to the affected service.

Manual Incident Creation

For issues you discover before monitoring catches them, or for situations that require manual intervention:

  1. Navigate to Incidents in the navigation bar
  2. Click Create Incident
  3. Fill in the details:
    • Title — A clear, customer-friendly description (e.g., "API Response Errors" not "DB connection pool exhausted")
    • Severity — Minor, Major, or Critical
    • Affected Services — Select which services are impacted
    • Initial Update — Describe what you know so far
    • Status — Investigating, Identified, Monitoring, or Resolved

[Screenshot: Creating an incident]

Severity Levels

Choose the severity level that matches the customer impact:

Minor

Limited impact. The service is functional but experiencing minor issues. Most customers are unaffected.

Examples: Slightly elevated response times, intermittent errors affecting a small percentage of requests.

Major

Significant impact. Core functionality is impaired or unavailable for a meaningful portion of users.

Examples: API returning errors for 50% of requests, a specific feature is completely unavailable.

Critical

Severe impact. The service is completely down or a critical function is unavailable for all users.

Examples: Complete service outage, data loss risk, security incident.

Incident Statuses

Investigating

You're aware of the problem and actively looking into it. This is the typical starting status.

Identified

The root cause has been found and a fix is being worked on. Customers know the team understands the problem.

Monitoring

A fix has been deployed and you're watching to confirm it resolves the issue. Customers know a fix is in progress.

Resolved

The incident is over. All services are back to normal operation.

Posting Updates

Keep customers informed by posting regular updates throughout the incident:

  1. Open the active incident
  2. Click Post Update
  3. Write a clear, concise update
  4. Change the incident status if appropriate
  5. Click Publish

Update Best Practices

  • Be honest — Tell customers what you know, even if you don't know everything yet
  • Use plain language — Avoid internal jargon or overly technical details
  • Include impact — Tell customers what they'll experience
  • Set expectations — If you have an ETA, share it. If not, say when you'll update next
  • Update regularly — Even if nothing has changed, post an update every 30-60 minutes during active incidents

Good Update Examples

"We've identified the root cause as a database connection issue. Our engineering team is deploying a fix. We expect services to be restored within 30 minutes."

"We are continuing to investigate increased error rates on the API. We will provide another update within 30 minutes."

Resolving an Incident

When the issue is fixed:

  1. Post a final update explaining the resolution
  2. Change the incident status to Resolved
  3. Affected service statuses automatically return to Operational

The resolved incident remains visible on your status page's incident history, providing transparency about past issues.

Incident History

All incidents are recorded in your incident history, including:

  • Timeline of events and updates
  • Duration of the incident
  • Affected services
  • Severity level

This history is available to your team internally and to customers on your public status page.