Heartbeat Monitoring

Heartbeat Monitoring for Cron Jobs, Lambda, and Scheduled Tasks

Monitor cron jobs, systemd timers, Kubernetes CronJobs, AWS Lambda, Heroku Scheduler, and Celery beat tasks with Alert24 heartbeat checks. Get alerted when any scheduled job stops running.

Scheduled jobs fail silently. A cron job that errors at 2am, a Lambda that gets throttled, a Celery task that stops because a worker went down — none of these send you an alert by default. Alert24 heartbeat checks fix this with a dead-man's switch: your job checks in after it completes, and if Alert24 doesn't hear from it within the expected interval, it fires an incident.

How it works

  1. Create a Heartbeat check in Alert24 (Monitoring → Add check → Heartbeat)
  2. Set the expected interval and grace period to match your job's schedule
  3. Alert24 gives you a ping URL: https://app.alert24.net/api/hb/YOUR_TOKEN
  4. Add a single curl call to the end of your job that fires on success
  5. If the job stops running — for any reason — Alert24 fires an incident after the grace period

The ping accepts GET, POST, or HEAD and requires no authentication or body. It's designed to be appended to any command with &&.

# The simplest pattern: append to your existing command
your-job-command && curl -fsS https://app.alert24.net/api/hb/YOUR_TOKEN

Supported platforms

Platform Guide
Linux cron jobs Cron → Alert24
systemd timers systemd → Alert24
GitHub Actions scheduled workflows GitHub Actions → Alert24
Kubernetes CronJob Kubernetes → Alert24
AWS Lambda (EventBridge scheduled) Lambda → Alert24
Heroku Scheduler Heroku → Alert24
Celery beat tasks Celery → Alert24

Setting the right interval and grace period

Job frequency Expected interval Suggested grace period
Every 5 minutes 300s 60s
Every hour 3600s 300s
Daily 86400s 1800s
Weekly 604800s 7200s

Set the grace period to cover normal runtime variance — if your job usually takes 10 minutes to run, a grace period of 15 minutes prevents false alerts while still catching real failures quickly.

What triggers an incident

Alert24 fires an incident if no heartbeat ping arrives within interval + grace_period seconds of the last successful ping. Common causes:

  • The job crashed or exited with a non-zero code (the && ensures no ping on failure)
  • The scheduler stopped (cron daemon down, EventBridge rule disabled, GitHub Actions quota exceeded)
  • The worker is down (Celery workers stopped, Lambda function deleted or throttled)
  • The server is unreachable (heartbeat check pings from Alert24's network, not your server)

Auto-resolution

When the job runs successfully again and sends a heartbeat ping, Alert24 automatically resolves the open incident. No manual action required.