SLA Uptime Tracking Starts With Understanding the Numbers
An SLA (Service Level Agreement) is a contract between you and your customers that defines the minimum acceptable uptime for your service. SLA uptime tracking is how you measure whether you're meeting that commitment.
The numbers look similar but differ dramatically in practice. Here's what each tier actually means.
SLA Uptime Tiers Explained
| SLA Level | Uptime % | Downtime Per Month | Downtime Per Year |
|---|---|---|---|
| 99% | "Two nines" | 7 hours 18 minutes | 3 days 15 hours |
| 99.9% | "Three nines" | 43 minutes 50 seconds | 8 hours 46 minutes |
| 99.95% | "Three and a half nines" | 21 minutes 55 seconds | 4 hours 23 minutes |
| 99.99% | "Four nines" | 4 minutes 23 seconds | 52 minutes 36 seconds |
| 99.999% | "Five nines" | 26 seconds | 5 minutes 16 seconds |
The jump from 99.9% to 99.99% sounds small. It's the difference between 43 minutes of allowed downtime per month and 4 minutes. That single decimal point changes your entire operational approach.
How to Calculate Your SLA Target
Start with your customer expectations, not your ambitions. A 99.999% SLA sounds impressive, but achieving it requires redundant infrastructure, automated failover, and zero-downtime deployments. That costs real money.
Most SaaS companies offer 99.9% or 99.95%. E-commerce platforms with high transaction volumes often target 99.99%. Internal tools and development platforms can get away with 99.5%.
Formula for allowed downtime:
Total minutes in a month: 43,200 (30 days)
Allowed downtime = 43,200 x (1 - SLA percentage)
For 99.9%: 43,200 x 0.001 = 43.2 minutes per month
Count only unplanned downtime. Scheduled maintenance windows announced in advance are typically excluded from SLA calculations if your contract specifies it. Make sure your SLA document is explicit about what counts.
What Counts as Downtime
This is where SLA disputes happen. Define downtime clearly in your agreement.
Typically counts as downtime:
- Complete service unavailability (HTTP 5xx or no response)
- Response times exceeding a defined threshold (e.g., >5 seconds)
- Core functionality failure (login, checkout, API) even if the homepage loads
- Partial outages affecting more than a defined percentage of users
Typically excluded:
- Scheduled maintenance with advance notice (24-72 hours)
- Issues caused by the customer's infrastructure
- Force majeure events
- Third-party service outages beyond your control (if specified)
- Beta or non-production environments
Write these definitions into your SLA before a customer challenges you on what "down" means.
How to Track Uptime Accurately
Tracking uptime requires continuous monitoring from external locations. Internal health checks are insufficient because they don't catch network issues, DNS failures, or CDN problems that affect real users.
Set Up External Monitoring
Use a monitoring service that checks your endpoints from multiple geographic locations. If your service is available from Virginia but unreachable from Frankfurt, that's still downtime for your EU customers.
Check every 60 seconds minimum. A 5-minute check interval means you could miss a 4-minute outage entirely, or record a 10-minute outage as only 5 minutes.
Track by Component
Don't report a single uptime number for your entire platform. Break it down by component:
- API: 99.97%
- Web app: 99.95%
- Authentication: 99.99%
- Webhooks: 99.92%
Component-level tracking helps you identify weak points and gives customers more useful information.
Account for Partial Degradation
Not every incident is a complete outage. Your monitoring should capture degraded performance as partial downtime.
One approach: if response times exceed 5x your baseline for a sustained period, count that time as 50% downtime. Document this methodology so customers understand how you calculate the number.
Reporting SLA Compliance to Customers
Enterprise customers expect regular uptime reports. Monthly is standard. Include:
- Overall uptime percentage for the reporting period
- Component-level breakdown if applicable
- Incident summary with timestamps, duration, and root causes
- Trend over time (last 3-6 months)
- SLA credit status if any thresholds were breached
Automate this reporting. Manually compiling uptime data each month is tedious and error-prone.
Your status page itself serves as a real-time SLA report. Tools like alert24.net and Better Stack display historical uptime percentages on the status page, giving customers continuous visibility without waiting for a monthly report.
SLA Credits and Consequences
Most SLAs include a credit structure for missed targets. Common structures:
| Uptime Achieved | Credit |
|---|---|
| 99.0% - 99.9% | 10% of monthly fee |
| 95.0% - 99.0% | 25% of monthly fee |
| Below 95.0% | 50% of monthly fee |
Credits are typically applied to future invoices, not refunded in cash. Cap total credits at 30-50% of the monthly fee to limit financial exposure.
Some companies offer more aggressive credits as a competitive differentiator. This works if your infrastructure genuinely supports it. Offering a 99.99% SLA with generous credits on unreliable infrastructure is a fast way to lose money.
Common SLA Mistakes
Promising more than you can deliver. A 99.99% SLA requires redundancy at every layer: compute, database, load balancer, DNS, and CDN. If any single component lacks failover, you can't hit four nines.
Not defining measurement methodology. Do you measure from one location or ten? Do you average across all checks or count any single failure? Specify this upfront.
Ignoring dependent services. If your SLA covers 99.9% uptime but your payment processor has a 99.5% SLA, your effective uptime ceiling is 99.5%. You can't be more available than your least reliable dependency.
No exclusions for maintenance. Without a maintenance exclusion clause, every deployment counts against your SLA. Define a maintenance window and notification process.
Start Tracking Today
Set up external monitoring with 60-second checks from at least 3 geographic locations. Define what constitutes downtime for your service. Publish your SLA with clear measurement methodology and credit terms. Then report monthly.
Your SLA is only as trustworthy as your ability to measure and report against it. Automated tracking removes the guesswork and gives both you and your customers confidence in the numbers.
