← Back to Blog

How to Route Datadog Monitor Alerts to the Right On-Call Engineer

The Problem with Datadog Notifications

Datadog is excellent at detecting problems. Its monitors catch high CPU, failing checks, anomalous latency, and hundreds of other signals before your users notice. Where it falls short is getting the right person paged.

Datadog's notification channels — email, Slack, PagerDuty, webhooks — are static. You define who receives an alert when you write the monitor. If your infrastructure team rotates on-call every week, someone has to remember to update every relevant monitor when the rotation changes. In practice, nobody does that consistently. Alerts go to last week's on-call engineer, or to a Slack channel where they get buried, or to a distribution list that pages five people when one would do.

What Datadog is missing is schedule-awareness. It can tell you that something is broken; it cannot tell you whose phone should ring at 2 a.m. That gap is where Alert24 fits.

How the Integration Works

The architecture is straightforward. Datadog sends alerts to an Alert24 webhook endpoint. Alert24 evaluates the alert against your routing rules — which can match on Datadog tags like service, env, or team — and then pages whoever is currently on call for the matched team. When the rotation advances, nothing in Datadog needs to change.

Datadog Monitor → Webhook → Alert24 Routing Rules → On-Call Schedule → Page Engineer

The on-call schedule lives entirely in Alert24. Datadog only needs a single webhook URL per team or service, and you configure it once.

Step 1: Create a Webhook Integration in Datadog

In Datadog, go to Integrations → Webhooks and add a new webhook.

Give it a name that matches the service or team — something like alert24-platform or alert24-payments. The name matters because you will reference it in your monitor notification bodies.

Set the URL to your Alert24 webhook endpoint. You can find this in Alert24 under Integrations → Inbound Webhooks. Copy the endpoint URL for the team you want to route alerts to.

For the payload, use Datadog's template variables to pass structured data Alert24 can act on:

{
  "alert_title": "$EVENT_TITLE",
  "alert_type": "$ALERT_TYPE",
  "alert_status": "$ALERT_STATUS",
  "alert_id": "$ALERT_ID",
  "monitor_id": "$ID",
  "priority": "$PRIORITY",
  "host": "$HOSTNAME",
  "tags": "$TAGS",
  "url": "$LINK",
  "body": "$EVENT_MSG"
}

This payload gives Alert24 enough context to build a useful incident: the monitor name, current status (alert vs. recovery), affected host, and the full tag set Datadog has for this event.

Enable the Use custom payload option and paste the JSON above. Save the webhook.

Step 2: Reference the Webhook in Your Monitors

In any Datadog monitor's notification message, add the webhook reference in the @ syntax:

There is a problem with {{host.name}} in {{env}}.

@webhook-alert24-platform

You can include multiple webhooks in one monitor if you want to fan out to multiple teams, though usually one team owns a given service. The webhook fires on alert, recovery, and renotification events — Alert24 can be configured to handle all three.

If you use Datadog's monitor templates or Terraform to manage monitors at scale, add the webhook reference to your default notification template so every new monitor is automatically covered.

Step 3: Configure Routing Rules in Alert24

When an alert arrives at your Alert24 webhook endpoint, routing rules decide how to handle it. Rules can match on any field in the payload — including the tags field Datadog sends.

Datadog formats tags as a comma-separated string like env:production,service:checkout,team:payments. Alert24's routing rule engine can match on substrings or patterns within that field.

A practical rule set for a payment service might look like this:

Condition Field Operator Value
Service match tags contains service:checkout
Environment filter tags contains env:production
Alert type alert_type equals error

When all conditions match, the rule triggers an escalation policy. The escalation policy is bound to an on-call schedule — the schedule defines who is on call right now. That person gets paged via SMS, voice call, email, or push notification according to your team's preferences.

If no one acknowledges within your defined window, the escalation moves to the secondary on-call or team lead.

Step 4: Map Tags to Teams

The real power comes from using Datadog's team tag to route across multiple teams from a single webhook setup.

Instead of one webhook per team, you can send all Datadog alerts to a single Alert24 inbound endpoint and use routing rules to fan them out. Create a rule for each team tag:

  • tags contains team:platform → page Platform on-call schedule
  • tags contains team:data → page Data Engineering on-call schedule
  • tags contains team:payments → page Payments on-call schedule

This means adding a new team to your on-call program is purely an Alert24 configuration change. You add the schedule, add the routing rule, and any Datadog monitor already tagged with team:newteam will automatically route correctly — no changes needed in Datadog.

Step 5: Handle Recovery Notifications

Datadog sends a recovery notification when a monitor returns to OK state. You want Alert24 to auto-resolve the incident when that happens so your on-call engineer is not left with a stale open incident.

In Alert24, set the inbound webhook to resolve on recovery. When the payload's alert_status field equals Recovered or OK, Alert24 closes the incident automatically. The on-call engineer receives a resolution notification and the incident timeline closes.

This keeps your incident log clean and gives you accurate MTTR data without requiring manual close-out.

What This Looks Like End-to-End

A realistic sequence for a production outage:

  1. A Datadog monitor for your checkout service fires at 2:14 a.m. — p95 latency exceeded 2 seconds for 5 minutes.
  2. The monitor message includes @webhook-alert24-payments, which sends the webhook payload to Alert24.
  3. Alert24 matches the rule for service:checkout and env:production, looks up the Payments team on-call schedule, and finds the current on-call engineer.
  4. That engineer receives an SMS and a push notification with the alert title, affected host, and a link back to the Datadog event.
  5. The engineer acknowledges within 5 minutes. The escalation timer stops.
  6. Thirty minutes later, the monitor recovers. Datadog sends a recovery webhook. Alert24 resolves the incident.

The engineer who was not on call sleeps through the whole thing.

Next Steps

If you are already running Datadog monitors, the setup above takes about 30 minutes. Start with one team and one service, verify that alerts and recoveries flow correctly, then expand to the rest of your monitors.

Create your Alert24 account and generate your first inbound webhook endpoint at alert24.com. If you are already using Datadog's PagerDuty integration, you can run both in parallel during a transition — there is no reason to cut over all at once.

Once routing is working, look at Alert24's escalation policy configuration. You can model multi-tier escalations, business-hours vs. after-hours routing, and service-specific response SLAs — all managed in one place, independent of which monitoring tool is generating the alerts.