The Status Page Nobody Updates at 2am
Your Nagios check fires. A service goes critical. PagerDuty or your phone wakes you up. You SSH in, start digging through logs, and spend the next twenty minutes figuring out what's actually broken. Somewhere in the middle of that, users start hitting your support inbox: "Is there an outage? Your status page says everything is operational."
The status page is almost always the last thing anyone thinks about during an incident. It's manual, it's low-urgency compared to actually fixing the problem, and at 2am when you're context-switching between terminals, updating a web form is not what's on your mind. So customers sit in the dark, filing tickets that eat up your queue for the next two days.
The fix isn't process — it's automation. If Nagios already knows when a host or service goes down, you can make that knowledge flow directly to your status page without anyone touching a keyboard.
How Nagios Event Handlers Work
Nagios has a built-in mechanism called event handlers: scripts that run automatically when a host or service changes state. When a service transitions from OK to CRITICAL, Nagios can call a script on the monitoring server. That script can do anything — send a Slack message, write to a database, or POST to an HTTP endpoint.
The key fields Nagios passes to your event handler script are:
| Macro | What it contains |
|---|---|
$SERVICESTATE$ |
Current state: OK, WARNING, CRITICAL, UNKNOWN |
$SERVICESTATETYPE$ |
SOFT or HARD (HARD means Nagios is confident this is real) |
$HOSTALIAS$ |
Human-readable name for the host |
$SERVICEDESC$ |
The name of the service check |
$SERVICEATTEMPT$ |
Which retry attempt this is |
You typically only want to fire your webhook on HARD state changes — SOFT states are retries that might recover before becoming a real incident. Acting on every SOFT state generates noise.
Setting Up the Event Handler Script
On your Nagios server, create a script at /usr/local/nagios/libexec/notify-status-page.sh:
#!/bin/bash
SERVICE_STATE=$1
SERVICE_STATE_TYPE=$2
HOST_NAME=$3
SERVICE_DESC=$4
WEBHOOK_URL="https://api.alert24.com/incoming/webhooks/YOUR_WEBHOOK_TOKEN"
# Only act on HARD state changes
if [ "$SERVICE_STATE_TYPE" != "HARD" ]; then
exit 0
fi
# Map Nagios states to Alert24 status values
case "$SERVICE_STATE" in
CRITICAL)
STATUS="down"
;;
WARNING)
STATUS="degraded"
;;
OK)
STATUS="operational"
;;
*)
STATUS="unknown"
;;
esac
# Build the JSON payload and POST to Alert24
curl -s -X POST "$WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d "{
\"service\": \"$SERVICE_DESC\",
\"host\": \"$HOST_NAME\",
\"status\": \"$STATUS\",
\"source\": \"nagios\"
}"
Make it executable:
chmod +x /usr/local/nagios/libexec/notify-status-page.sh
Wiring the Handler to a Nagios Service
In your Nagios service definition, add the event_handler directive pointing at a command that calls your script:
# In commands.cfg
define command {
command_name notify-status-page
command_line /usr/local/nagios/libexec/notify-status-page.sh $SERVICESTATE$ $SERVICESTATETYPE$ $HOSTALIAS$ $SERVICEDESC$
}
# In your service definition
define service {
host_name web-prod-01
service_description HTTP
check_command check_http
event_handler notify-status-page
event_handler_enabled 1
...
}
Reload Nagios after making this change:
nagios -v /etc/nagios/nagios.cfg && systemctl reload nagios
You can also set event_handler_enabled=1 globally in nagios.cfg if you want handlers to run by default across all service definitions.
Where Alert24 Picks Up From Here
The webhook is the handoff point. When Alert24 receives the incoming POST, it looks at the service field to match the payload against a monitored service in your account. From there, the platform handles the parts that would otherwise require someone to be awake and typing:
- The linked status page service flips to the appropriate status (Degraded, Down, or back to Operational when Nagios recovers)
- A status page incident is opened automatically with a timestamped update
- Your on-call schedule determines who gets paged — no need to also configure Nagios notifications separately for that
When the check recovers and Nagios sends "status": "operational", Alert24 closes the incident and posts a resolution update to the status page. Customers who subscribed to status updates get notified automatically.
Matching Webhooks to Status Page Services
In your Alert24 dashboard, navigate to the incoming webhook configuration for the token you used in the script. You'll map each service value in the payload to a specific service on your status page. If you have multiple Nagios checks that all relate to your API (health check, response time, certificate expiry), you can route them all to the same status page component. A CRITICAL from any of them sets the component to Down; the component only returns to Operational when all of them are back to OK.
This lets you keep your Nagios checks granular for diagnostic purposes without fragmenting your status page into dozens of components that confuse customers.
Testing Before You Rely on It
Before you trust this in production, force a state change in Nagios to verify the end-to-end flow:
# Manually submit a passive check result to trigger a CRITICAL state
/usr/local/nagios/bin/send_nsca -H web-prod-01 -p 5667 <<EOF
web-prod-01\tHTTP\t2\tManual test: forced CRITICAL
EOF
Or use the Nagios web interface under "Submit Passive Check Result" to push a CRITICAL. Watch the webhook log in Alert24 to confirm the payload arrived and the status page updated. Then submit an OK result and verify the recovery flow works as well.
It is worth testing recovery explicitly. A stuck "Down" status page after the service has actually recovered is almost as bad as not updating it at all — customers lose trust in the accuracy of what you're publishing.
A Note on Flapping
If a service is flapping — toggling between OK and CRITICAL repeatedly — Nagios has a built-in flap detection mechanism that suppresses notifications during unstable periods. Event handlers are also suppressed when Nagios considers a service to be flapping, which means your webhook will not fire repeatedly during a rapid oscillation. This is the behavior you want. If you have flap detection disabled in your environment, consider enabling it before relying on this integration in production.
Next Steps
If you have Nagios already running in your environment, this integration takes about thirty minutes to set up end-to-end. The main steps:
- Create the event handler script on your Nagios server
- Define the Nagios command and add
event_handlerto your service definitions - Create an incoming webhook in Alert24 and paste the token into the script
- Map the webhook's
servicevalues to your status page components in Alert24 - Test with a forced state change and verify the status page updates
If your team uses Nagios XI rather than Core, the configuration is nearly identical — the event handler mechanism works the same way, and XI's web UI gives you a checkbox to enable event handlers per-service without editing config files directly.
The goal is that the next time a check fires at 2am, the status page has already updated itself before you've even opened a terminal.