← Back to Blog

How to Update a Customer Status Page from AWS CloudWatch

The Awkward Gap Nobody Talks About

Your CloudWatch alarm fires at 2:47 AM. By 2:48, your on-call engineer gets paged. By 2:51, they've logged in and started looking at dashboards. Meanwhile, your customers have been hitting errors since 2:46 — one minute before the alarm even fired — and they have no idea whether you know, whether you care, or whether this is going to last ten minutes or ten hours.

That gap between "alarm fires" and "status page updated" is where customer trust quietly erodes. Manually updating a status page in the middle of an incident response is easy to forget and easy to deprioritize. The fix is to not make it manual at all.

This guide shows you how to wire CloudWatch directly to a public-facing status page update so that the moment an alarm triggers, your customers see "Investigating" — before anyone on your team has touched a keyboard.


The Architecture

The flow has four hops:

CloudWatch Alarm → SNS Topic → Lambda Function → Alert24 Webhook

CloudWatch already knows how to publish to SNS when an alarm state changes. From there, a small Lambda function transforms the SNS payload into a status page update via Alert24's webhook receiver. The whole thing takes about 30 minutes to set up and costs almost nothing to run.


Step 1: Create an SNS Topic for Alarm Events

If you don't already have one, create a dedicated SNS topic for infrastructure alerts. You can do this in the AWS console or with the CLI:

aws sns create-topic --name cloudwatch-status-alerts

Note the ARN — you'll need it in the next step.


Step 2: Configure Your CloudWatch Alarm to Publish to SNS

In the CloudWatch console, open your alarm and add the SNS topic as an action for the In alarm state (and optionally for the OK state if you want to auto-resolve). Using the CLI:

aws cloudwatch put-metric-alarm \
  --alarm-name "API-Error-Rate-High" \
  --alarm-actions "arn:aws:sns:us-east-1:123456789012:cloudwatch-status-alerts" \
  --ok-actions "arn:aws:sns:us-east-1:123456789012:cloudwatch-status-alerts" \
  --alarm-description "API 5xx error rate above threshold" \
  --metric-name 5XXError \
  --namespace AWS/ApiGateway \
  --statistic Sum \
  --period 60 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2

The --ok-actions line is optional but valuable — it lets you automate the "Resolved" update when the alarm clears.


Step 3: Write the Lambda Function

This is the translator. It receives the SNS notification, parses the alarm state, and calls Alert24's webhook to update the affected service's status.

import json
import urllib.request
import os

ALERT24_WEBHOOK_URL = os.environ["ALERT24_WEBHOOK_URL"]
SERVICE_ID = os.environ["ALERT24_SERVICE_ID"]

STATE_MAP = {
    "ALARM": "investigating",
    "OK": "operational",
    "INSUFFICIENT_DATA": "investigating",
}

def lambda_handler(event, context):
    for record in event["Records"]:
        message = json.loads(record["Sns"]["Message"])
        alarm_name = message.get("AlarmName", "Unknown Alarm")
        new_state = message.get("NewStateValue", "ALARM")
        reason = message.get("NewStateReason", "")

        status = STATE_MAP.get(new_state, "investigating")

        payload = {
            "service_id": SERVICE_ID,
            "status": status,
            "message": f"{alarm_name}: {reason}" if status != "operational" else "Service has recovered.",
        }

        data = json.dumps(payload).encode("utf-8")
        req = urllib.request.Request(
            ALERT24_WEBHOOK_URL,
            data=data,
            headers={"Content-Type": "application/json"},
            method="POST",
        )

        with urllib.request.urlopen(req) as response:
            print(f"Alert24 response: {response.status}")

    return {"statusCode": 200}

Set two environment variables on the Lambda function:

Variable Value
ALERT24_WEBHOOK_URL Your Alert24 inbound webhook URL
ALERT24_SERVICE_ID The ID of the service to update on your status page

You can find both in the Alert24 dashboard under your status page's webhook settings.


Step 4: Subscribe the Lambda to the SNS Topic

aws sns subscribe \
  --topic-arn "arn:aws:sns:us-east-1:123456789012:cloudwatch-status-alerts" \
  --protocol lambda \
  --notification-endpoint "arn:aws:lambda:us-east-1:123456789012:function:cloudwatch-status-updater"

Then grant SNS permission to invoke the function:

aws lambda add-permission \
  --function-name cloudwatch-status-updater \
  --statement-id sns-invoke \
  --action lambda:InvokeFunction \
  --principal sns.amazonaws.com \
  --source-arn "arn:aws:sns:us-east-1:123456789012:cloudwatch-status-alerts"

Step 5: Map Your Alarms to Services

One CloudWatch alarm maps to one Alert24 service. If you have multiple services — API, database, file uploads, payments — you'll either want separate Lambda functions per service (each with its own SERVICE_ID environment variable) or a single Lambda that routes based on alarm name prefix.

A simple routing approach:

SERVICE_ROUTING = {
    "API-": os.environ["API_SERVICE_ID"],
    "DB-": os.environ["DB_SERVICE_ID"],
    "Payments-": os.environ["PAYMENTS_SERVICE_ID"],
}

def get_service_id(alarm_name):
    for prefix, service_id in SERVICE_ROUTING.items():
        if alarm_name.startswith(prefix):
            return service_id
    return os.environ["DEFAULT_SERVICE_ID"]

This keeps a single Lambda function while allowing fine-grained status page updates per service component.


What the Customer Sees

From the customer's perspective, this is what changes. When an alarm fires without this setup, your status page sits at "All systems operational" while users are getting 503s. With it:

Time Without automation With automation
T+0 Alarm fires Alarm fires
T+1 min Status page shows "Investigating"
T+3 min On-call engineer paged On-call engineer paged
T+8 min Engineer starts diagnosing Engineer starts diagnosing
T+12 min Someone remembers to update status page (already done)
T+45 min Alarm clears Alarm clears, status auto-resolves

The difference is not just operational efficiency — it's the message your customers receive at the worst moment. "We know" is a powerful thing to say, even when you haven't fixed it yet.


Handling False Positives

If your alarms are noisy and you don't want every brief spike to trigger a public status update, add a simple debounce. Set your CloudWatch alarm's evaluation-periods to 3 or higher so it only fires after sustained degradation. You can also add a condition in the Lambda that skips the update if the alarm has resolved within a short window — though for most teams, a brief "Investigating" that quickly self-resolves to "Operational" is preferable to no communication at all.


Next Steps

Start with one alarm and one service. Wire up your highest-visibility CloudWatch alarm — probably the one that causes the most customer-visible errors — to a single Alert24 service. Watch it work. Then expand to the rest of your alarm inventory.

On the Alert24 side, you can also configure the webhook to trigger an on-call notification simultaneously, so the status page update and the page to your engineer happen in parallel rather than sequentially. That combination — automated customer communication plus automated team escalation — is what gets your mean time to acknowledge (MTTA) down and keeps customers informed throughout.

If you're not already using Alert24, the webhook receiver and status page features are available on all plans. You can set up your status page at alert24.com and have this pipeline running before your next incident.