The Gap Between Alerting and Incident Management
Prometheus tells you something is wrong. AlertManager routes the notification to Slack, PagerDuty, or an email list. And then... you're in a group chat hoping someone picks it up, hoping that person follows through, hoping you can reconstruct what happened when the postmortem rolls around.
That's the gap. Prometheus and AlertManager are excellent at detecting and routing alerts. Neither one gives you a tracked incident — a single record that captures who acknowledged it, what actions were taken, how long resolution took, and whether your status page reflected the outage. For that you need an incident management layer sitting at the end of the webhook chain.
This post shows you exactly how to build that pipeline: Prometheus rule to AlertManager to webhook to Alert24 incident, with deduplication so repeated firings don't flood you with duplicates.
Prometheus Alert Rule Anatomy
Start with a well-formed alert rule. Here's a realistic example for a service with elevated error rates:
groups:
- name: api-service
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m])) by (instance, job)
/
sum(rate(http_requests_total[5m])) by (instance, job)
> 0.05
for: 2m
labels:
severity: critical
team: backend
annotations:
summary: "High error rate on {{ $labels.instance }}"
description: "Error rate is {{ $value | humanizePercentage }} on {{ $labels.instance }} (job: {{ $labels.job }})"
runbook_url: "https://wiki.example.com/runbooks/high-error-rate"
A few things to pay attention to here. The for: 2m clause means the condition must be true for two minutes before the alert fires — this suppresses transient spikes that resolve on their own. The labels block attaches metadata that AlertManager can use for routing and that your adapter layer can use for incident enrichment. The annotations block is where human-readable context lives; the summary becomes your incident title.
The instance label is particularly important. You'll use alertname + instance as your deduplication key later.
AlertManager Receiver Configuration
AlertManager's job is to group, inhibit, and route alerts. You need a receiver that posts to a webhook endpoint — either your own adapter or directly to Alert24's inbound webhook if the payload format matches.
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'instance', 'job']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: alert24-incidents
routes:
- match:
severity: critical
receiver: alert24-incidents
- match:
severity: warning
receiver: alert24-slack-only
receivers:
- name: alert24-incidents
webhook_configs:
- url: 'https://your-adapter.example.com/alertmanager'
send_resolved: true
http_config:
bearer_token: 'your-adapter-secret'
- name: alert24-slack-only
slack_configs:
- api_url: 'https://hooks.slack.com/services/...'
channel: '#alerts'
The send_resolved: true flag is critical. When Prometheus determines the condition is no longer true, AlertManager sends a resolved notification. Your adapter can use this to automatically resolve the Alert24 incident rather than leaving it open indefinitely.
group_by: ['alertname', 'instance', 'job'] controls how AlertManager batches alerts before sending. Grouping by instance means you get one notification per affected host rather than a flood of individual alerts.
The Adapter and Transform Layer
AlertManager's webhook payload doesn't match Alert24's incident API directly. You need a small adapter — a lightweight HTTP service (or a Cloudflare Worker, Lambda function, or similar) that translates the payload.
Here's what AlertManager sends:
{
"version": "4",
"groupKey": "{}:{alertname=\"HighErrorRate\", instance=\"api-01.prod\"}",
"status": "firing",
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighErrorRate",
"instance": "api-01.prod",
"job": "api-service",
"severity": "critical"
},
"annotations": {
"summary": "High error rate on api-01.prod",
"description": "Error rate is 8.3% on api-01.prod (job: api-service)",
"runbook_url": "https://wiki.example.com/runbooks/high-error-rate"
},
"startsAt": "2026-05-28T14:22:00Z",
"endsAt": "0001-01-01T00:00:00Z"
}
]
}
Your adapter transforms this into an Alert24 incident creation request:
export async function handleAlertManager(request) {
const payload = await request.json();
for (const alert of payload.alerts) {
const alias = `${alert.labels.alertname}:${alert.labels.instance}`;
const isFiring = alert.status === 'firing';
if (isFiring) {
await createOrUpdateIncident({
title: alert.annotations.summary,
description: [
alert.annotations.description,
alert.annotations.runbook_url
? `Runbook: ${alert.annotations.runbook_url}`
: null,
].filter(Boolean).join('\n\n'),
severity: mapSeverity(alert.labels.severity),
alias: alias,
source: 'prometheus',
labels: alert.labels,
});
} else {
await resolveIncidentByAlias(alias);
}
}
}
function mapSeverity(prometheusSeverity) {
const map = { critical: 'critical', warning: 'high', info: 'low' };
return map[prometheusSeverity] ?? 'medium';
}
The alias field is your deduplication key. When Alert24 receives an incident creation request with an alias that already exists as an open incident, it updates the existing incident rather than creating a new one. This is what prevents duplicate incidents when AlertManager fires the same alert repeatedly across repeat_interval cycles.
Deduplication: Why alertname + instance Works
| Field | Purpose |
|---|---|
alertname |
Identifies the type of problem (HighErrorRate, DiskFull, etc.) |
instance |
Identifies which host or target is affected |
| Combined alias | Uniquely identifies "this type of problem on this specific target" |
Using just alertname as the alias would deduplicate across all instances, so an error rate spike on api-01 and api-02 simultaneously would create only one incident. That's usually wrong. Using alertname + instance means each affected host gets its own incident, which is what you want for root cause isolation and separate resolution tracking.
If your alerts include a job label and you want finer granularity, you can extend the alias to alertname:instance:job. The key principle is that the alias should uniquely identify the problem scope you want tracked as a single unit.
What You Get at the Other End
Once the pipeline is in place, every critical Prometheus alert becomes a full Alert24 incident: a timestamped record with an acknowledgment workflow, timeline of status changes, and the ability to post updates to a status page. Your team has one place to look rather than scrolling up through Slack to find when someone said they were looking into it.
The resolved notification from AlertManager closes the incident automatically, so your mean time to resolution metrics reflect actual resolution rather than whenever someone remembered to close a ticket.
Next Steps
Start with a single alert rule and test the full pipeline in a staging AlertManager instance before rolling it out to production. Verify that:
- A firing alert creates an incident in Alert24 with the correct title and severity
- A second firing of the same alert updates the existing incident rather than creating a duplicate
- A resolved notification closes the incident
Once that's working, extend the adapter to handle your full severity matrix and to pull in additional context from your annotations — runbook links, dashboard URLs, and anything else that speeds up the response.
If you're already running AlertManager, the adapter is the only new piece. The rest is configuration changes to an existing system. Start there.