The Problem With Default GitHub Actions Failure Notifications
Your production deployment fails at 2 AM. GitHub dutifully sends a failure email — to the engineer who last pushed to the repository three days ago, who is now sound asleep and not on call. The engineer who is on call hears nothing. The outage compounds.
GitHub Actions has no built-in concept of on-call rotation. The failure notification goes to the committer, full stop. That works fine for a solo project or a team where everyone is always available, but it breaks down completely in any organization that has a rotation, a service boundary, or a dedicated ops engineer watching production.
The fix is straightforward: add an explicit failure notification step to your workflow that calls your incident management tool — bypassing GitHub's email entirely for production-critical alerts — and routes the alert based on which service or repository is affected.
How GitHub Actions Workflow Notifications Work
Before adding anything, it helps to understand what you're working with. GitHub Actions sends failure emails based on repository notification settings and the committer's personal preferences. You have no control over where those emails go from inside the workflow definition.
What you do control is the steps array in each job. You can add a step that only runs on failure, has access to all the workflow context variables (repository name, branch, run ID, workflow name), and can call any HTTP endpoint. That's the hook you need.
Adding a Failure Step That Calls Alert24
Alert24 exposes an inbound webhook endpoint for each integration you configure. When your workflow fails, you POST a payload to that endpoint with enough context for Alert24 to route the alert to the correct on-call team, open an incident, and start the escalation sequence.
Here is a complete example for a deployment workflow:
name: Deploy to Production
on:
push:
branches:
- main
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build
run: npm ci && npm run build
- name: Deploy
run: ./scripts/deploy.sh
- name: Notify Alert24 on failure
if: failure()
env:
ALERT24_WEBHOOK_URL: ${{ secrets.ALERT24_WEBHOOK_URL }}
run: |
curl -s -X POST "$ALERT24_WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d '{
"summary": "Deployment failed: ${{ github.repository }}",
"severity": "critical",
"source": "github-actions",
"details": {
"repository": "${{ github.repository }}",
"workflow": "${{ github.workflow }}",
"branch": "${{ github.ref_name }}",
"run_id": "${{ github.run_id }}",
"run_url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}",
"triggered_by": "${{ github.actor }}"
}
}'
The if: failure() condition is the key. GitHub evaluates this after the preceding steps complete, and it only executes the step if any earlier step in the job exited with a non-zero status. It will not run if the workflow succeeds.
The run_url field is worth including — it gives the on-call engineer a direct link to the failed run log so they are not hunting through the Actions UI at 2 AM.
Store the webhook URL in your repository secrets under Settings > Secrets and variables > Actions, not in the workflow file itself. Each Alert24 integration has its own webhook URL, which gives you routing granularity without exposing credentials.
Routing to the Right Team by Repository or Service
If you have multiple repositories or services, you probably want each one to alert a different on-call team. A payments service failure should not wake up the infrastructure on-call engineer. Alert24 handles this through separate integrations, one per team or service.
The pattern is simple:
| Repository | Alert24 Secret Name | On-Call Team |
|---|---|---|
| acme/payments-api | ALERT24_PAYMENTS_WEBHOOK |
Payments Team |
| acme/auth-service | ALERT24_AUTH_WEBHOOK |
Platform Team |
| acme/data-pipeline | ALERT24_DATA_WEBHOOK |
Data Engineering |
Each webhook URL is configured in Alert24 to route to a specific escalation policy. The workflow YAML for each repository references its own secret. You don't need any routing logic inside the workflow — the URL itself determines where the alert goes.
If you manage dozens of repositories, you can centralize this by using a GitHub organization-level secret and encoding the service name in the payload. Alert24's routing rules can then match on the repository or source fields to direct alerts to the right team without requiring a unique webhook per repo.
Suppressing Noise From Non-Production Branches
Not every workflow failure needs to wake someone up. A failing test on a feature branch is a problem for the engineer who opened the pull request, not the on-call engineer. You want failure alerting only for workflows that affect production.
The cleanest approach is branch filtering in the workflow trigger:
on:
push:
branches:
- main
This limits the entire workflow to production pushes, so the failure step can only fire in production context.
If you have a workflow that runs on multiple branches but only wants to page on-call for main, you can add a branch check inside the failure step itself:
- name: Notify Alert24 on failure
if: failure() && github.ref == 'refs/heads/main'
env:
ALERT24_WEBHOOK_URL: ${{ secrets.ALERT24_WEBHOOK_URL }}
run: |
curl -s -X POST "$ALERT24_WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d '{
"summary": "Production pipeline failed: ${{ github.workflow }}",
"severity": "critical",
"source": "github-actions",
"details": {
"repository": "${{ github.repository }}",
"workflow": "${{ github.workflow }}",
"branch": "${{ github.ref_name }}",
"run_url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
}
}'
The compound condition failure() && github.ref == 'refs/heads/main' means the notification only fires when both conditions are true: the workflow failed and the branch is main.
For scheduled workflows, which always run on the default branch, you typically always want failure alerting. For pull request workflows, you typically never want it. Being explicit about both the trigger and the branch condition is less error-prone than relying on implicit behavior.
Handling Flaky Workflows
Some workflows are inherently flaky — external API calls that occasionally timeout, network-dependent tests, integration suites with known intermittent failures. Paging on-call every time a flaky workflow fails generates alert fatigue and trains your team to ignore pages.
Alert24's deduplication and alert grouping handle this well if you're consistent about the summary field. Alerts with the same summary that arrive within a configurable window get grouped into a single incident rather than generating separate pages. If the same workflow fails three times in an hour, the on-call engineer gets one incident with three data points, not three separate 3 AM phone calls.
You can also add a retry step before the notification step. If the underlying operation is idempotent, retrying once before alerting reduces false positives without introducing meaningful delay in genuine failure detection.
Next Steps
Getting this into production takes about fifteen minutes once you have an Alert24 account:
- Create an integration in Alert24 under the team that owns the service. Copy the webhook URL.
- Add the webhook URL as a secret in your GitHub repository or organization settings.
- Add the failure step to your deployment workflow YAML.
- Trigger a test failure by temporarily adding
exit 1to a non-destructive step, confirm the alert arrives and routes correctly, then remove it. - Repeat for each service that has a distinct on-call owner.
The GitHub Actions side of this is straightforward. The more important piece is making sure Alert24's escalation policies accurately reflect who is actually on call — a working webhook to a stale rotation is only marginally better than no alerting at all. Review your escalation policies alongside this change and make sure rotations are up to date.