The Awkward Gap Nobody Talks About
Your CloudWatch alarm fires at 2:47 AM. By 2:48, your on-call engineer gets paged. By 2:51, they've logged in and started looking at dashboards. Meanwhile, your customers have been hitting errors since 2:46 — one minute before the alarm even fired — and they have no idea whether you know, whether you care, or whether this is going to last ten minutes or ten hours.
That gap between "alarm fires" and "status page updated" is where customer trust quietly erodes. Manually updating a status page in the middle of an incident response is easy to forget and easy to deprioritize. The fix is to not make it manual at all.
This guide shows you how to wire CloudWatch directly to a public-facing status page update so that the moment an alarm triggers, your customers see "Investigating" — before anyone on your team has touched a keyboard.
The Architecture
The flow has four hops:
CloudWatch Alarm → SNS Topic → Lambda Function → Alert24 Webhook
CloudWatch already knows how to publish to SNS when an alarm state changes. From there, a small Lambda function transforms the SNS payload into a status page update via Alert24's webhook receiver. The whole thing takes about 30 minutes to set up and costs almost nothing to run.
Step 1: Create an SNS Topic for Alarm Events
If you don't already have one, create a dedicated SNS topic for infrastructure alerts. You can do this in the AWS console or with the CLI:
aws sns create-topic --name cloudwatch-status-alerts
Note the ARN — you'll need it in the next step.
Step 2: Configure Your CloudWatch Alarm to Publish to SNS
In the CloudWatch console, open your alarm and add the SNS topic as an action for the In alarm state (and optionally for the OK state if you want to auto-resolve). Using the CLI:
aws cloudwatch put-metric-alarm \
--alarm-name "API-Error-Rate-High" \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:cloudwatch-status-alerts" \
--ok-actions "arn:aws:sns:us-east-1:123456789012:cloudwatch-status-alerts" \
--alarm-description "API 5xx error rate above threshold" \
--metric-name 5XXError \
--namespace AWS/ApiGateway \
--statistic Sum \
--period 60 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2
The --ok-actions line is optional but valuable — it lets you automate the "Resolved" update when the alarm clears.
Step 3: Write the Lambda Function
This is the translator. It receives the SNS notification, parses the alarm state, and calls Alert24's webhook to update the affected service's status.
import json
import urllib.request
import os
ALERT24_WEBHOOK_URL = os.environ["ALERT24_WEBHOOK_URL"]
SERVICE_ID = os.environ["ALERT24_SERVICE_ID"]
STATE_MAP = {
"ALARM": "investigating",
"OK": "operational",
"INSUFFICIENT_DATA": "investigating",
}
def lambda_handler(event, context):
for record in event["Records"]:
message = json.loads(record["Sns"]["Message"])
alarm_name = message.get("AlarmName", "Unknown Alarm")
new_state = message.get("NewStateValue", "ALARM")
reason = message.get("NewStateReason", "")
status = STATE_MAP.get(new_state, "investigating")
payload = {
"service_id": SERVICE_ID,
"status": status,
"message": f"{alarm_name}: {reason}" if status != "operational" else "Service has recovered.",
}
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(
ALERT24_WEBHOOK_URL,
data=data,
headers={"Content-Type": "application/json"},
method="POST",
)
with urllib.request.urlopen(req) as response:
print(f"Alert24 response: {response.status}")
return {"statusCode": 200}
Set two environment variables on the Lambda function:
| Variable | Value |
|---|---|
ALERT24_WEBHOOK_URL |
Your Alert24 inbound webhook URL |
ALERT24_SERVICE_ID |
The ID of the service to update on your status page |
You can find both in the Alert24 dashboard under your status page's webhook settings.
Step 4: Subscribe the Lambda to the SNS Topic
aws sns subscribe \
--topic-arn "arn:aws:sns:us-east-1:123456789012:cloudwatch-status-alerts" \
--protocol lambda \
--notification-endpoint "arn:aws:lambda:us-east-1:123456789012:function:cloudwatch-status-updater"
Then grant SNS permission to invoke the function:
aws lambda add-permission \
--function-name cloudwatch-status-updater \
--statement-id sns-invoke \
--action lambda:InvokeFunction \
--principal sns.amazonaws.com \
--source-arn "arn:aws:sns:us-east-1:123456789012:cloudwatch-status-alerts"
Step 5: Map Your Alarms to Services
One CloudWatch alarm maps to one Alert24 service. If you have multiple services — API, database, file uploads, payments — you'll either want separate Lambda functions per service (each with its own SERVICE_ID environment variable) or a single Lambda that routes based on alarm name prefix.
A simple routing approach:
SERVICE_ROUTING = {
"API-": os.environ["API_SERVICE_ID"],
"DB-": os.environ["DB_SERVICE_ID"],
"Payments-": os.environ["PAYMENTS_SERVICE_ID"],
}
def get_service_id(alarm_name):
for prefix, service_id in SERVICE_ROUTING.items():
if alarm_name.startswith(prefix):
return service_id
return os.environ["DEFAULT_SERVICE_ID"]
This keeps a single Lambda function while allowing fine-grained status page updates per service component.
What the Customer Sees
From the customer's perspective, this is what changes. When an alarm fires without this setup, your status page sits at "All systems operational" while users are getting 503s. With it:
| Time | Without automation | With automation |
|---|---|---|
| T+0 | Alarm fires | Alarm fires |
| T+1 min | — | Status page shows "Investigating" |
| T+3 min | On-call engineer paged | On-call engineer paged |
| T+8 min | Engineer starts diagnosing | Engineer starts diagnosing |
| T+12 min | Someone remembers to update status page | (already done) |
| T+45 min | Alarm clears | Alarm clears, status auto-resolves |
The difference is not just operational efficiency — it's the message your customers receive at the worst moment. "We know" is a powerful thing to say, even when you haven't fixed it yet.
Handling False Positives
If your alarms are noisy and you don't want every brief spike to trigger a public status update, add a simple debounce. Set your CloudWatch alarm's evaluation-periods to 3 or higher so it only fires after sustained degradation. You can also add a condition in the Lambda that skips the update if the alarm has resolved within a short window — though for most teams, a brief "Investigating" that quickly self-resolves to "Operational" is preferable to no communication at all.
Next Steps
Start with one alarm and one service. Wire up your highest-visibility CloudWatch alarm — probably the one that causes the most customer-visible errors — to a single Alert24 service. Watch it work. Then expand to the rest of your alarm inventory.
On the Alert24 side, you can also configure the webhook to trigger an on-call notification simultaneously, so the status page update and the page to your engineer happen in parallel rather than sequentially. That combination — automated customer communication plus automated team escalation — is what gets your mean time to acknowledge (MTTA) down and keeps customers informed throughout.
If you're not already using Alert24, the webhook receiver and status page features are available on all plans. You can set up your status page at alert24.com and have this pipeline running before your next incident.