Your Website Can Be Up While Your DNS Is Down
DNS monitoring catches a category of outages that standard HTTP checks miss entirely. Your web server can be running perfectly, but if DNS resolution fails, no one can reach it. To the user, it looks exactly like your site is down.
DNS failures are responsible for some of the largest and most confusing outages in internet history. When Dyn was hit by a DDoS attack in 2016, major sites like Twitter, Netflix, and Reddit became unreachable. Their servers were fine. DNS wasn't.
How DNS Failures Happen
DNS Provider Outages
Your DNS provider (Cloudflare, Route 53, Google Cloud DNS, or your registrar's default nameservers) is a dependency. If they go down, your domain stops resolving.
This is a single point of failure that many teams overlook. You can have redundant web servers across three cloud providers, but if all your DNS records point to one provider's nameservers, a single DNS outage takes everything offline.
TTL Misconfiguration
TTL (Time To Live) controls how long DNS resolvers cache your records. A TTL of 3600 means resolvers cache your DNS records for 1 hour.
Too high (24+ hours): DNS changes take a full day to propagate. If you need to switch servers during an emergency, users won't reach the new server for hours.
Too low (60 seconds): Every request triggers a fresh DNS lookup. This increases latency for every page load and puts more load on your DNS provider.
Recommended: 300-3600 seconds (5 minutes to 1 hour) for production records. Lower the TTL to 60-300 seconds before planned migrations, then raise it back after.
Registrar Issues
Your domain registrar (GoDaddy, Namecheap, Cloudflare Registrar) controls the NS records that delegate authority to your DNS provider. If the registrar has an issue, removes your NS records, or lets your domain expire, DNS stops working entirely.
Domain expiration is more common than you'd think. Major companies have accidentally let domains expire, causing full outages.
DNSSEC Failures
DNSSEC adds cryptographic signatures to DNS records to prevent spoofing. When DNSSEC validation fails (expired signatures, key rotation errors, incorrect DS records), resolvers that enforce DNSSEC will refuse to resolve your domain.
The result: some users can reach your site (those using resolvers that don't enforce DNSSEC) while others can't. This creates intermittent, confusing outages that are difficult to diagnose without DNS-specific monitoring.
What to Monitor
Record Resolution
The most basic DNS check: can your domain be resolved? Monitor that your A record, AAAA record, and CNAME records return the expected IP addresses.
Check from multiple DNS resolvers:
- Google Public DNS (8.8.8.8)
- Cloudflare DNS (1.1.1.1)
- Your ISP's resolver
- A resolver in a different geographic region
If your domain resolves on one resolver but not another, you have a propagation issue.
Response Time
DNS resolution should take under 100ms. If it takes longer, it adds latency to every single request a user makes to your site.
Monitor DNS resolution time and alert when it exceeds 200ms. Slow DNS is often an early indicator of provider issues before a full outage.
SOA Record Monitoring
The SOA (Start of Authority) record contains the serial number that changes when your DNS zone is updated. Monitor that the SOA serial matches across all your nameservers. A mismatch means one or more nameservers have stale data.
Nameserver Health
Monitor each of your nameservers individually. If you have four nameservers and one is down, DNS still works (resolvers try others), but you've lost redundancy. The next nameserver failure will cause an outage.
Domain Expiration
Monitor your domain expiration date. Alert at 60 days, 30 days, and 14 days before expiry. Auto-renewal should be enabled, but payment failures (expired credit card) can cause auto-renewal to silently fail.
Multi-Region DNS Monitoring
DNS issues are frequently regional. A DNS provider might have an outage affecting their European points of presence while US and Asia continue working.
Monitor DNS resolution from at least three geographic regions:
- US (East and West)
- Europe
- Asia-Pacific
If resolution fails from one region but succeeds from others, that's a provider issue worth investigating even though your site is "mostly up." Users in the affected region can't reach you.
DNS Monitoring vs HTTP Monitoring
HTTP monitoring checks: "Can I connect to this server and get a response?"
DNS monitoring checks: "Can I even find this server?"
DNS resolution happens before the HTTP connection. If DNS fails, the HTTP check never starts. But here's the catch: many monitoring tools send HTTP checks from their own infrastructure using cached DNS. They might successfully resolve your domain from cache even while public DNS is broken.
Best practice: Run dedicated DNS checks alongside your HTTP checks. Don't rely on HTTP monitoring to catch DNS issues.
Setting Up DNS Monitoring
Step 1: Identify Your Critical Records
List every DNS record that matters:
yourdomain.comA record (main website)www.yourdomain.comCNAME (www subdomain)api.yourdomain.comA/CNAME (API)mail.yourdomain.comMX record (email delivery)status.yourdomain.comCNAME (status page)
Step 2: Define Expected Values
For each record, specify the expected response:
- A record should return
1.2.3.4 - CNAME should resolve to
your-load-balancer.example.com - MX should point to your email provider
Step 3: Configure Checks
Use a monitoring tool that supports DNS-specific checks. Alert24, Better Stack, and Gatus all support DNS record monitoring. Configure checks to run every 60-300 seconds from multiple regions.
Step 4: Set Up Alerts
Alert immediately on:
- Complete resolution failure (record not found)
- Unexpected record value (IP changed without your knowledge)
- DNS response time > 200ms
- Domain expiration approaching (30 days)
When DNS Monitoring Saves You
Scenario 1: Registrar payment failure. Your credit card expires. Auto-renewal fails silently. Your domain expires in 60 days. DNS monitoring alerts you at 60 days, 30 days, and 14 days. You update your payment method and avoid a complete outage.
Scenario 2: DNS provider partial outage. Cloudflare DNS has degraded performance in EU. Your HTTP checks from US pass fine. DNS monitoring from EU shows 500ms resolution times (normally 20ms). You contact Cloudflare and add a secondary DNS provider before it becomes a full outage.
Scenario 3: Unauthorized DNS change. A compromised admin account changes your A record to point to a phishing site. DNS monitoring detects the unexpected IP change within 60 seconds and alerts your security team.
Without DNS-specific monitoring, each of these scenarios would be detected much later, typically by confused users or downstream failures.
Pair DNS Monitoring With Your Status Page
When DNS issues affect your users, your status page needs to reflect it. But here's the irony: if your main domain's DNS is down, users can't reach your status page either (if it's on a subdomain).
Host your status page on a different domain or use a third-party status page provider like alert24.net that runs on completely separate infrastructure. This ensures your status page is reachable even during a DNS outage on your primary domain.
