Overview

Incidents in Alert24 represent events where one or more of your services are experiencing problems. The incident management system helps you track the issue from detection through resolution while keeping your customers informed via your status page.

Creating an Incident

Automatic Incident Creation

When monitoring checks detect a failure and the failure threshold is met, Alert24 can automatically create an incident linked to the affected service.

Manual Incident Creation

For issues you discover before monitoring catches them, or for situations that require manual intervention:

Navigate to Incidents in the navigation bar
Click Create Incident
Fill in the details:
- Title — A clear, customer-friendly description (e.g., "API Response Errors" not "DB connection pool exhausted")
- Severity — Critical, High, Medium, Low, or Maintenance
- Affected Services — Select which services are impacted
- Initial Update — Describe what you know so far
- Status — New, Acknowledged, Investigating, Identified, Monitoring, or Resolved

[Screenshot: Creating an incident]

Severity Levels

Choose the severity level that matches the customer impact:

Critical

Severe impact. The service is completely down or a critical function is unavailable for all users.

Examples: Complete service outage, data loss risk, security incident.

High

Significant impact. Core functionality is impaired or unavailable for a meaningful portion of users.

Examples: API returning errors for 50% of requests, a specific feature is completely unavailable.

Medium

Moderate impact. Some users are affected but the service is generally functional.

Examples: Elevated response times, intermittent errors affecting a portion of requests.

Low

Limited impact. The service is functional but experiencing minor issues. Most customers are unaffected.

Examples: Slightly elevated response times, cosmetic issues, minor degradation.

Maintenance

Planned work. Use this severity for scheduled maintenance windows.

Incident Statuses

New

The incident has been created but not yet acknowledged by a responder.

Acknowledged

A team member has acknowledged the incident and is beginning to respond.

Investigating

You're actively looking into the problem. This is the typical status after acknowledgment.

Identified

The root cause has been found and a fix is being worked on. Customers know the team understands the problem.

Monitoring

A fix has been deployed and you're watching to confirm it resolves the issue. Customers know a fix is in progress.

Resolved

The incident is over. All services are back to normal operation.

Posting Updates

Keep customers informed by posting regular updates throughout the incident:

Open the active incident
Click Post Update
Write a clear, concise update
Change the incident status if appropriate
Click Publish

Update Best Practices

Be honest — Tell customers what you know, even if you don't know everything yet
Use plain language — Avoid internal jargon or overly technical details
Include impact — Tell customers what they'll experience
Set expectations — If you have an ETA, share it. If not, say when you'll update next
Update regularly — Even if nothing has changed, post an update every 30-60 minutes during active incidents

Good Update Examples

"We've identified the root cause as a database connection issue. Our engineering team is deploying a fix. We expect services to be restored within 30 minutes."

"We are continuing to investigate increased error rates on the API. We will provide another update within 30 minutes."

Resolving an Incident

When the issue is fixed:

Post a final update explaining the resolution
Change the incident status to Resolved
Affected service statuses automatically return to Operational

The resolved incident remains visible on your status page's incident history, providing transparency about past issues.

Incident History

All incidents are recorded in your incident history, including:

Timeline of events and updates
Duration of the incident
Affected services
Severity level

This history is available to your team internally and to customers on your public status page.

Incident Roles

Incident roles let you define responsibilities during incident response. Assign team members to specific roles (e.g., Incident Commander, Communications Lead, Technical Lead) so everyone knows their responsibilities.

Configuring Roles

Navigate to Settings > Incident Roles
Create custom roles with names and descriptions
When an incident is created, assign team members to roles

Roles help structure your incident response process, especially as your team grows.

Incident Templates

Incident templates pre-populate incident fields so you can create incidents faster and more consistently. Templates are useful for recurring incident types.

Setting Up Templates

Navigate to Settings > Incident Templates
Click Create Template
Define pre-populated fields: title pattern, severity, affected services, initial update text, and custom fields
When creating an incident, select a template to auto-fill the form

Example Templates

Deployment Failure — Pre-fills severity as High, selects the deployment service, and includes a standard initial update
Third-Party Outage — Pre-fills with common text for communicating upstream provider issues
Scheduled Maintenance — Pre-fills severity as Maintenance with standard maintenance messaging

Workflow Templates

Workflow templates automate actions during the incident lifecycle. Configure workflows to trigger automatically based on incident events.

Configuring Workflows

Navigate to Settings > Workflows
Create workflow templates that define automated actions
Workflows can trigger notifications, update service statuses, assign roles, and more based on incident state changes

Workflows help ensure consistent incident response procedures across your team.