The Problem with a Flat Alert Stream

Your Prometheus setup is firing alerts. A pod in the payments namespace crashes, and the platform team gets paged. They spend ten minutes confirming they have no idea what the payments service does, then start hunting for whoever owns it. Meanwhile, the payments team is asleep with their phones on silent because they have no on-call schedule set up — they assumed the platform team handled all of that.

This is the failure mode that happens when you treat Kubernetes alerting as a single stream instead of routing it by ownership. The fix is not complicated, but it requires wiring two things together: Prometheus alert labels that encode who owns what, and an alerting backend that uses those labels to route notifications to the right people.

Step 1: Label Your Prometheus Alert Rules by Team

Prometheus PrometheusRule resources support arbitrary labels in both the rule group and the individual alert. The key is to add a team label that maps to the team responsible for that alert — not just the namespace, because some namespaces are owned by multiple teams or a single team owns multiple namespaces.

Here is a minimal example with two namespaces and two teams:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: payments-alerts
  namespace: payments
  labels:
    prometheus: kube-prometheus
    role: alert-rules
spec:
  groups:
    - name: payments.pod-health
      rules:
        - alert: PaymentsPodCrashLooping
          expr: |
            rate(kube_pod_container_status_restarts_total{namespace="payments"}[5m]) > 0.1
          for: 5m
          labels:
            team: payments
            namespace: payments
            severity: critical
          annotations:
            summary: "Pod {{ $labels.pod }} is crash looping in the payments namespace"
            description: "Restart rate is {{ $value }} restarts/sec over the last 5 minutes"

        - alert: PaymentsHighErrorRate
          expr: |
            sum(rate(http_requests_total{namespace="payments",status=~"5.."}[5m]))
            / sum(rate(http_requests_total{namespace="payments"}[5m])) > 0.05
          for: 2m
          labels:
            team: payments
            namespace: payments
            severity: warning
          annotations:
            summary: "Payments service error rate above 5%"
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: platform-alerts
  namespace: kube-system
  labels:
    prometheus: kube-prometheus
    role: alert-rules
spec:
  groups:
    - name: platform.node-health
      rules:
        - alert: NodeMemoryPressure
          expr: |
            node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.10
          for: 10m
          labels:
            team: platform
            severity: warning
          annotations:
            summary: "Node {{ $labels.node }} memory available below 10%"

A few things to notice here. The team label is set explicitly at the individual rule level, not just inherited from the namespace. That is intentional — it lets you handle cases where platform-owned infrastructure runs inside an application namespace. The severity label follows the conventional critical / warning / info scale, which you will use later to control escalation behavior.

Step 2: Verify Labels Flow Through to Alertmanager

Before configuring routing, confirm the labels are actually present on firing alerts. Check the Alertmanager UI or query the Prometheus alerts endpoint:

kubectl port-forward svc/prometheus-operated 9090:9090 -n monitoring
# Then open http://localhost:9090/alerts

Look at the labels on a test alert and confirm team and severity appear. If you are using the Prometheus Operator and labels are missing, check that your PrometheusRule has the selector labels that your Prometheus custom resource expects — usually prometheus: kube-prometheus and role: alert-rules.

Step 3: Configure Alertmanager to Forward to Alert24

Alert24 exposes a webhook endpoint that accepts Alertmanager payloads. Each Alert24 integration has its own endpoint URL, so you can create separate integrations per team and route to them based on the team label.

In Alertmanager, the configuration looks like this:

global:
  resolve_timeout: 5m

route:
  receiver: platform-team
  group_by: [alertname, namespace, team]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - match:
        team: payments
      receiver: payments-team
      continue: false

    - match:
        team: platform
      receiver: platform-team
      continue: false

receivers:
  - name: payments-team
    webhook_configs:
      - url: "https://api.alert24.app/v1/integrations/webhook/YOUR_PAYMENTS_WEBHOOK_ID"
        send_resolved: true

  - name: platform-team
    webhook_configs:
      - url: "https://api.alert24.app/v1/integrations/webhook/YOUR_PLATFORM_WEBHOOK_ID"
        send_resolved: true

The continue: false on each route prevents an alert from matching multiple receivers. The group_by includes team so alerts from different teams that happen to fire simultaneously do not get merged into a single notification.

Step 4: Set Up Team-Specific On-Call Schedules in Alert24

The webhook integration is the inbound channel. The on-call schedule determines who actually gets notified.

In Alert24, create a separate service for each team. The payments team gets a "Payments" service, the platform team gets a "Platform" service. Each service has its own on-call schedule and escalation policy.

Team	Alert24 Service	On-Call Schedule	Escalation
Payments	Payments	Payments rotation (weekly)	L1 engineer → payments lead after 10 min
Platform	Platform Infra	Platform rotation (weekly)	L1 engineer → platform lead after 10 min

When a PaymentsPodCrashLooping alert fires, it goes to the payments webhook, which triggers the payments on-call schedule. The platform team never sees it unless the payments team explicitly escalates.

You can also filter by severity at the Alert24 routing level. If you want warning alerts to create an incident but not page anyone at 3am, configure the payments service escalation policy to only send SMS and phone calls for severity=critical alerts, while warning alerts create the incident and send an email.

Handling Edge Cases

Cross-namespace dependencies. The payments service might depend on a shared database in the data namespace. Own that alert clearly — either the data team owns it and routes to themselves, or the payments team labels it with team: payments if they are the primary consumer. Pick one owner per alert; split ownership is no ownership.

Cluster-wide alerts. Some alerts (node pressure, control plane failures) are not owned by any application team. These should carry team: platform and route to the platform team's schedule regardless of which namespace the alerting resource lives in.

Alert flapping. If a pod crash loops and resolves repeatedly, Alertmanager will send both firing and resolved notifications. Make sure send_resolved: true is set in your Alertmanager webhook config so Alert24 can automatically close the incident when the alert resolves. Without this, you will accumulate open incidents that never close.

Next Steps

Start with the label convention. Add team and severity to every existing PrometheusRule in your cluster — this is the foundation everything else depends on. Then create the Alert24 webhook integrations and point your Alertmanager routes at them. Finally, build out on-call schedules for each team so the routing actually reaches a human.

If you do not have Alert24 set up yet, the free plan covers the basics: create a service, generate a webhook URL, configure an on-call schedule, and your first routed alert will page the right person. The configuration here will work the same way whether your cluster has two teams or twenty.

The goal is a state where a payments engineer can be confident that a payments alert will reach them, and equally confident they will not get woken up for a node memory issue they have no context on.

How to Route Different Kubernetes Namespace Alerts to Different Teams