Your service runs on Kubernetes. When something goes wrong, your internal dashboards light up — Grafana shows a spike, AlertManager fires, your Slack channel fills with noise. Meanwhile, your customers are hitting errors and refreshing a status page that says "All systems operational."
That gap is the problem. The monitoring stack you already have knows about the incident. Your customers don't. This post walks through closing that gap by routing Kubernetes alerts through AlertManager into Alert24, so the status page reflects reality without anyone manually updating it.
What You're Connecting
If your cluster runs the standard monitoring stack, you already have the pieces:
- Prometheus scraping metrics from your workloads and the cluster itself
- AlertManager grouping and routing those alerts
- Alert24 handling the status page, on-call routing, and incident tracking
The integration point is AlertManager's webhook receiver. Alert24 exposes a webhook endpoint that AlertManager can POST to directly, which means you don't need a custom aggregator or middleware to bridge them.
Setting Up the AlertManager Webhook
In your AlertManager configuration, add a receiver that points to your Alert24 webhook URL. You get this URL from the Alert24 dashboard under Monitoring Checks > New Check > Incoming Webhook.
# alertmanager.yml
receivers:
- name: 'alert24-webhook'
webhook_configs:
- url: 'https://alert24.com/hooks/inbound/YOUR_TOKEN_HERE'
send_resolved: true
http_config:
bearer_token: 'YOUR_API_KEY'
route:
group_by: ['alertname', 'namespace', 'pod']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'alert24-default'
routes:
- match:
severity: critical
receiver: 'alert24-webhook'
- match:
severity: warning
receiver: 'alert24-webhook'
The send_resolved: true flag is important. When Prometheus clears an alert, AlertManager will POST the resolution to Alert24, which automatically closes the incident and updates the status page without manual intervention.
Linking Alerts to a Specific Service
Alert24 maps incoming webhook payloads to services you define in the platform. This is where the status page connection lives. If you have a service called "Payments API," you create it in Alert24 under Status Page > Services, then link it to the monitoring check that receives your webhook.
When AlertManager fires an alert with labels matching that service, the status page flips from "Operational" to "Degraded" or "Outage" depending on the severity you configure.
To make the mapping predictable, use consistent labels in your Prometheus alerting rules:
# alerts.yml
groups:
- name: payments-api
rules:
- alert: PaymentsAPICrashLoopBackOff
expr: |
kube_pod_container_status_waiting_reason{
reason="CrashLoopBackOff",
namespace="production",
pod=~"payments-api-.*"
} == 1
for: 2m
labels:
severity: critical
service: payments-api
team: backend
annotations:
summary: "Payments API pod is in CrashLoopBackOff"
description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has been in CrashLoopBackOff for more than 2 minutes."
- alert: PaymentsAPIHighRestartCount
expr: |
increase(kube_pod_container_status_restarts_total{
namespace="production",
pod=~"payments-api-.*"
}[15m]) > 3
for: 1m
labels:
severity: warning
service: payments-api
team: backend
annotations:
summary: "Payments API pods restarting frequently"
description: "Pods matching payments-api-* have restarted more than 3 times in 15 minutes."
In Alert24, configure the webhook check to read the service label from the incoming payload and match it to your status page service. This means a single webhook endpoint can serve multiple services without creating a separate webhook per service.
What the Alert24 Webhook Payload Looks Like
AlertManager sends a JSON body when it fires. Here's a simplified version of what arrives at your Alert24 endpoint:
{
"status": "firing",
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "PaymentsAPICrashLoopBackOff",
"severity": "critical",
"service": "payments-api",
"namespace": "production",
"pod": "payments-api-7d9f8b-xk2pq"
},
"annotations": {
"summary": "Payments API pod is in CrashLoopBackOff",
"description": "Pod payments-api-7d9f8b-xk2pq has been in CrashLoopBackOff for more than 2 minutes."
},
"startsAt": "2026-05-28T14:32:00Z"
}
]
}
Alert24 parses this payload and uses the labels to route the alert. The summary and description annotations become the incident title and body, which also appear on the status page incident timeline so customers can see what's happening.
What a CrashLoopBackOff Incident Looks Like End-to-End
Here's the sequence when a deployment goes bad:
| Time | What happens |
|---|---|
| T+0 | A bad deploy lands. A payments-api pod begins CrashLoopBackOff. |
| T+2m | Prometheus alert PaymentsAPICrashLoopBackOff fires (after for: 2m). |
| T+2m+30s | AlertManager groups and routes the alert to the Alert24 webhook. |
| T+3m | Alert24 creates an incident, pages the on-call engineer, and flips the status page to "Degraded." |
| T+15m | On-call engineer rolls back the deployment. Pods recover. |
| T+17m | Prometheus clears the alert. AlertManager sends a resolved event. |
| T+17m | Alert24 auto-resolves the incident and updates the status page to "Operational." |
Customers see the status page reflect the actual incident window — not a manually-written post-mortem that went up an hour later.
Cluster-Level Alerts Worth Adding
Beyond pod-level alerts, there are cluster signals that should surface on a status page when they indicate user-facing impact:
Node pressure: If nodes hit memory or disk pressure, Kubernetes starts evicting pods. Before users see errors, kube_node_status_condition{condition="MemoryPressure",status="true"} will be true. Alert on it.
Deployment rollout stuck: kube_deployment_status_condition{condition="Progressing",status="false"} catches a stuck rollout before the restart count climbs.
HPA at max replicas under load: If kube_horizontalpodautoscaler_status_current_replicas >= kube_horizontalpodautoscaler_spec_max_replicas, you're at capacity ceiling. That's a latency risk worth surfacing.
These don't all need to flip the status page to "Outage" — you can configure Alert24 to show them as "Performance Issues" or "Degraded" based on severity label.
Keeping the Status Page Meaningful
A status page that cries wolf trains customers to ignore it. A few practices help:
Use the for duration in your Prometheus alert rules to avoid transient flaps. A pod that crashes and recovers in 90 seconds probably shouldn't have created a public incident.
Separate noisy internal alerts from customer-facing ones using AlertManager routing. Route team: infra alerts to a different receiver that pages your team but doesn't touch the status page. Only alerts with customer_facing: true or a specific severity should go to the status page webhook.
Write useful annotations. The description field becomes the incident body on your status page. "Pod payments-api-7d9f8b-xk2pq has been in CrashLoopBackOff" is more useful to your team than "A pod is crashing," and it shows customers you have visibility into the problem.
Next Steps
If you're starting from scratch, get the webhook integration working first with a test alert before tuning severity routing. AlertManager's amtool CLI lets you fire test alerts without waiting for a real incident:
amtool alert add alertname="TestAlert" severity="critical" service="payments-api"
Watch for it to arrive in Alert24 and verify the status page flips. Once that works, layer in the Prometheus alert rules for the signals that matter to your specific workloads.
From there, look at Alert24's incident templates to standardize how your team communicates during a Kubernetes incident — what goes in the status page update, when to post an update versus waiting for resolution, and how to write the post-incident summary that closes the incident timeline.
The monitoring stack you have already knows when things break. Getting that signal to your customers in real time is a matter of connecting the pipes.