Scheduled jobs fail silently. A cron job that errors at 2am, a Lambda that gets throttled, a Celery task that stops because a worker went down — none of these send you an alert by default. Alert24 heartbeat checks fix this with a dead-man's switch: your job checks in after it completes, and if Alert24 doesn't hear from it within the expected interval, it fires an incident.
How it works
- Create a Heartbeat check in Alert24 (Monitoring → Add check → Heartbeat)
- Set the expected interval and grace period to match your job's schedule
- Alert24 gives you a ping URL:
https://app.alert24.net/api/hb/YOUR_TOKEN - Add a single
curlcall to the end of your job that fires on success - If the job stops running — for any reason — Alert24 fires an incident after the grace period
The ping accepts GET, POST, or HEAD and requires no authentication or body. It's designed to be appended to any command with &&.
# The simplest pattern: append to your existing command
your-job-command && curl -fsS https://app.alert24.net/api/hb/YOUR_TOKEN
Supported platforms
| Platform | Guide |
|---|---|
| Linux cron jobs | Cron → Alert24 |
| systemd timers | systemd → Alert24 |
| GitHub Actions scheduled workflows | GitHub Actions → Alert24 |
| Kubernetes CronJob | Kubernetes → Alert24 |
| AWS Lambda (EventBridge scheduled) | Lambda → Alert24 |
| Heroku Scheduler | Heroku → Alert24 |
| Celery beat tasks | Celery → Alert24 |
Setting the right interval and grace period
| Job frequency | Expected interval | Suggested grace period |
|---|---|---|
| Every 5 minutes | 300s | 60s |
| Every hour | 3600s | 300s |
| Daily | 86400s | 1800s |
| Weekly | 604800s | 7200s |
Set the grace period to cover normal runtime variance — if your job usually takes 10 minutes to run, a grace period of 15 minutes prevents false alerts while still catching real failures quickly.
What triggers an incident
Alert24 fires an incident if no heartbeat ping arrives within interval + grace_period seconds of the last successful ping. Common causes:
- The job crashed or exited with a non-zero code (the
&&ensures no ping on failure) - The scheduler stopped (cron daemon down, EventBridge rule disabled, GitHub Actions quota exceeded)
- The worker is down (Celery workers stopped, Lambda function deleted or throttled)
- The server is unreachable (heartbeat check pings from Alert24's network, not your server)
Auto-resolution
When the job runs successfully again and sends a heartbeat ping, Alert24 automatically resolves the open incident. No manual action required.