Kubernetes CronJobs can fail silently — a failed pod, a misconfigured schedule, or resource pressure can prevent your job from completing without any obvious alert. An Alert24 heartbeat check gives you a dead-man's switch: if the CronJob stops checking in, you get an incident.
Before you start
- In Alert24, go to Monitoring → Add check → Heartbeat
- Name it after your CronJob (e.g., "Invoice generation job")
- Set Expected interval in seconds matching your CronJob schedule
- Set Grace period to account for pod startup time (120–300 seconds is typical)
- Save and copy the heartbeat URL
Store the token as a Kubernetes secret
kubectl create secret generic alert24-heartbeat \
--from-literal=token=YOUR_TOKEN \
-n your-namespace
Option A: Add curl to your existing container
If your job image has curl, add the ping as the last command:
apiVersion: batch/v1
kind: CronJob
metadata:
name: invoice-generator
spec:
schedule: "0 */6 * * *" # every 6 hours
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: invoice-generator
image: my-app:latest
env:
- name: ALERT24_HB_TOKEN
valueFrom:
secretKeyRef:
name: alert24-heartbeat
key: token
command:
- sh
- -c
- |
python /app/generate_invoices.py && \
curl -fsS --retry 3 "https://app.alert24.net/api/hb/$ALERT24_HB_TOKEN"
Option B: Sidecar container
If you can't modify the main container's entrypoint, add a sidecar init container that pings Alert24 only if the main job finishes successfully. A cleaner approach is a separate Job container in sequence:
spec:
template:
spec:
restartPolicy: OnFailure
initContainers:
- name: run-job
image: my-app:latest
command: ["python", "/app/generate_invoices.py"]
containers:
- name: heartbeat
image: curlimages/curl:latest
env:
- name: ALERT24_HB_TOKEN
valueFrom:
secretKeyRef:
name: alert24-heartbeat
key: token
command:
- sh
- -c
- curl -fsS --retry 3 "https://app.alert24.net/api/hb/$ALERT24_HB_TOKEN"
In this pattern, the heartbeat container only runs after the init container exits successfully.
Tips
restartPolicy: OnFailure: With this set, Kubernetes retries the pod if it exits non-zero. The heartbeat only fires when the job fully completes — retries don't ping Alert24 until success.backoffLimit: Set a reasonablebackoffLimiton the Job spec so Kubernetes doesn't retry indefinitely. After the backoff limit is exhausted without a heartbeat ping, Alert24 fires an incident.successfulJobsHistoryLimit: Set this to 3–5 to keep recent successful job pods for debugging, without accumulating hundreds of completed pods.startingDeadlineSeconds: If your CronJob misses its window (e.g., the cluster was overloaded), Kubernetes skips it. SetstartingDeadlineSecondsto a value shorter than your Alert24 grace period so missed windows always result in an Alert24 incident.