Heartbeat Monitoring

Monitor Celery Beat Tasks with Alert24 Heartbeat Checks

Add heartbeat monitoring to Celery beat periodic tasks. Alert24 fires an incident if a task stops executing — covers on_success signal, task base class, and retry behavior.

Celery beat schedules periodic tasks, but if the beat scheduler stops, if a worker goes down, or if a task fails repeatedly, your scheduled work stops silently. Alert24 heartbeat checks detect this automatically — if a task stops checking in, you get an incident.

Before you start

  1. In Alert24, go to Monitoring → Add check → Heartbeat
  2. Name it after your Celery task (e.g., "Generate reports task")
  3. Set Expected interval to match your task's beat_schedule period in seconds
  4. Set Grace period to account for Celery task queue wait time (60–300 seconds)
  5. Save and copy the heartbeat URL

Add the ping to your task

# tasks.py
import requests
from celery import shared_task
from django.conf import settings  # or use os.environ

ALERT24_HB_URL = f"https://app.alert24.net/api/hb/{settings.ALERT24_HB_GENERATE_REPORTS}"

@shared_task
def generate_reports():
    # Your existing task logic
    Report.objects.generate_daily()
    send_report_emails()

    # Ping Alert24 on success (only reached if no exception above)
    requests.get(ALERT24_HB_URL, timeout=5)

Configuration

Store the heartbeat token in your settings or environment:

# settings.py or .env
ALERT24_HB_GENERATE_REPORTS = "YOUR_TOKEN"
# celery.py beat schedule stays unchanged
app.conf.beat_schedule = {
    'generate-reports-daily': {
        'task': 'tasks.generate_reports',
        'schedule': crontab(hour=2, minute=0),  # 2am daily
    },
}

Using a task base class for multiple tasks

If you want to add heartbeat pings to many tasks without repeating the curl call in each:

import os
import requests
from celery import Task

class HeartbeatTask(Task):
    abstract = True
    alert24_hb_token = None

    def on_success(self, retval, task_id, args, kwargs):
        if self.alert24_hb_token:
            try:
                url = f"https://app.alert24.net/api/hb/{self.alert24_hb_token}"
                requests.get(url, timeout=5)
            except Exception:
                pass  # don't let heartbeat failure mask task success
        super().on_success(retval, task_id, args, kwargs)

@shared_task(base=HeartbeatTask, alert24_hb_token=os.environ.get('ALERT24_HB_REPORTS'))
def generate_reports():
    Report.objects.generate_daily()

@shared_task(base=HeartbeatTask, alert24_hb_token=os.environ.get('ALERT24_HB_CLEANUP'))
def cleanup_old_data():
    OldData.objects.delete_expired()

Tips

  • on_success vs end of task: The on_success signal fires when a task completes without raising an exception — which is exactly when you want the heartbeat to fire. Calling requests.get directly at the end of the function is simpler and equivalent for most cases.
  • Celery beat restart: If the Celery beat process is restarted (e.g., during deployment), tasks may be delayed. Set your grace period to account for typical deployment downtime.
  • Task retries: Celery's max_retries causes a task to retry on failure. The heartbeat only fires on final success — if all retries are exhausted without success, no ping is sent and Alert24 fires an incident.
  • Worker availability: Heartbeat checks also catch cases where beat is running but all workers are down. If tasks are queued but not executing, they time out and Alert24 fires.