Heartbeat Monitoring

Monitor Kubernetes CronJobs with Alert24 Heartbeat Checks

Add heartbeat monitoring to any Kubernetes CronJob so Alert24 fires an incident if the job stops completing successfully. Covers sidecar, init container, and direct curl patterns.

Kubernetes CronJobs can fail silently — a failed pod, a misconfigured schedule, or resource pressure can prevent your job from completing without any obvious alert. An Alert24 heartbeat check gives you a dead-man's switch: if the CronJob stops checking in, you get an incident.

Before you start

  1. In Alert24, go to Monitoring → Add check → Heartbeat
  2. Name it after your CronJob (e.g., "Invoice generation job")
  3. Set Expected interval in seconds matching your CronJob schedule
  4. Set Grace period to account for pod startup time (120–300 seconds is typical)
  5. Save and copy the heartbeat URL

Store the token as a Kubernetes secret

kubectl create secret generic alert24-heartbeat \
  --from-literal=token=YOUR_TOKEN \
  -n your-namespace

Option A: Add curl to your existing container

If your job image has curl, add the ping as the last command:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: invoice-generator
spec:
  schedule: "0 */6 * * *"  # every 6 hours
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: invoice-generator
              image: my-app:latest
              env:
                - name: ALERT24_HB_TOKEN
                  valueFrom:
                    secretKeyRef:
                      name: alert24-heartbeat
                      key: token
              command:
                - sh
                - -c
                - |
                  python /app/generate_invoices.py && \
                  curl -fsS --retry 3 "https://app.alert24.net/api/hb/$ALERT24_HB_TOKEN"

Option B: Sidecar container

If you can't modify the main container's entrypoint, add a sidecar init container that pings Alert24 only if the main job finishes successfully. A cleaner approach is a separate Job container in sequence:

spec:
  template:
    spec:
      restartPolicy: OnFailure
      initContainers:
        - name: run-job
          image: my-app:latest
          command: ["python", "/app/generate_invoices.py"]
      containers:
        - name: heartbeat
          image: curlimages/curl:latest
          env:
            - name: ALERT24_HB_TOKEN
              valueFrom:
                secretKeyRef:
                  name: alert24-heartbeat
                  key: token
          command:
            - sh
            - -c
            - curl -fsS --retry 3 "https://app.alert24.net/api/hb/$ALERT24_HB_TOKEN"

In this pattern, the heartbeat container only runs after the init container exits successfully.

Tips

  • restartPolicy: OnFailure: With this set, Kubernetes retries the pod if it exits non-zero. The heartbeat only fires when the job fully completes — retries don't ping Alert24 until success.
  • backoffLimit: Set a reasonable backoffLimit on the Job spec so Kubernetes doesn't retry indefinitely. After the backoff limit is exhausted without a heartbeat ping, Alert24 fires an incident.
  • successfulJobsHistoryLimit: Set this to 3–5 to keep recent successful job pods for debugging, without accumulating hundreds of completed pods.
  • startingDeadlineSeconds: If your CronJob misses its window (e.g., the cluster was overloaded), Kubernetes skips it. Set startingDeadlineSeconds to a value shorter than your Alert24 grace period so missed windows always result in an Alert24 incident.