Incidents & On-Call

Escalation Policies

What are Escalation Policies?

An escalation policy defines what happens when the on-call person doesn't respond to an alert within a certain time. It ensures that incidents don't go unacknowledged by automatically escalating to additional team members.

How Escalation Works

  1. An alert fires and is sent to the primary on-call person
  2. If the primary doesn't acknowledge within the configured time (e.g., 5 minutes), the alert escalates
  3. The next person in the escalation chain is notified
  4. If they don't respond either, it escalates again — until someone acknowledges or the policy is exhausted

This layered approach ensures that even during unusual circumstances — phone on silent, person asleep, device issues — someone responds to the incident.

Setting Up an Escalation Policy

Step 1: Create a Policy

  1. Navigate to On-Call > Escalation Policies
  2. Click Create Policy
  3. Name the policy (e.g., "Production Incidents", "Critical Alerts")

Step 2: Define Escalation Levels

Each level in the policy defines who gets notified and when:

Level 1 — Primary On-Call

  • Target: The current on-call person from your on-call schedule
  • Timeout: 5 minutes (time before escalating to the next level)

Level 2 — Secondary On-Call / Team Lead

  • Target: A specific person or the next person in the on-call rotation
  • Timeout: 10 minutes

Level 3 — Engineering Manager

  • Target: A specific person or group
  • Timeout: 15 minutes

Level 4 — Executive / Final Escalation

  • Target: CTO, VP of Engineering, or a group channel
  • No timeout (final level)

[Screenshot: Escalation policy configuration]

Step 3: Assign the Policy

Link the escalation policy to your monitoring checks or services. When an alert fires for a linked service, the escalation policy determines who gets notified and in what order.

Escalation Level Configuration

Targets

Each escalation level can notify:

  • On-call schedule — Whoever is currently on-call in a specific schedule
  • Specific user — A named individual (useful for managers at higher levels)
  • Group — Multiple people simultaneously

Timeouts

The timeout is how long to wait for an acknowledgment before escalating. Common configurations:

Level Typical Timeout Reasoning
Level 1 5 minutes Enough time for the on-call person to see and acknowledge
Level 2 10 minutes Allows time for the backup to get context
Level 3 15 minutes Manager-level escalation, may need more response time
Level 4 None Final level — keep notifying until acknowledged

Repeat Behavior

You can configure whether the entire escalation chain repeats if nobody acknowledges. For critical alerts, repeating ensures the incident is never ignored.

Acknowledging Alerts

When an on-call person receives an alert, they acknowledge it to stop the escalation:

  • Click the Acknowledge link in the email or SMS notification
  • Acknowledge from within the Alert24 app

Acknowledging an alert signals that someone is actively working on the incident. It doesn't resolve the incident — it just stops the escalation chain.

Example Escalation Scenarios

Normal Response

  1. Alert fires → Primary on-call notified
  2. Primary acknowledges within 2 minutes
  3. Escalation stops. Primary works the incident.

Primary Unavailable

  1. Alert fires → Primary on-call notified
  2. 5 minutes pass, no acknowledgment
  3. Alert escalates → Secondary on-call notified
  4. Secondary acknowledges within 3 minutes
  5. Secondary works the incident.

Critical Incident, All Levels

  1. Alert fires → Primary on-call notified
  2. 5 minutes, no response → Secondary notified
  3. 10 minutes, no response → Engineering Manager notified
  4. Engineering Manager acknowledges and coordinates the response

Best Practices

  • Keep escalation chains short — 3-4 levels is sufficient for most teams
  • Set reasonable timeouts — Too short creates unnecessary escalations; too long delays response
  • Include a group at the final level — Ensures someone always sees the alert
  • Test your escalation policies — Run a test alert to verify notifications reach the right people
  • Review policies quarterly — Update as team structure and responsibilities change