Your Pager Is Crying Wolf
Here is a scenario most on-call engineers know too well: it is 3 AM, your phone buzzes, and you drag yourself out of bed to investigate an alert. Fifteen minutes later, you confirm it is another false positive --- the third one tonight. You silence the notification, roll over, and try to fall back asleep. But sleep does not come easily when you know the next buzz could be real.
Now multiply that across your entire team, every week, for months.
This is alert fatigue, and it is quietly destroying engineering organizations from the inside out. The costs are not just operational. They are human, financial, and compounding. And for teams running 5 to 50 engineers, where every person matters, the damage hits harder and faster than most leaders realize.
The Scale of the Problem
The numbers paint a grim picture. According to a 2025 Catchpoint study, 70% of SRE teams report alert fatigue as a top-three operational concern. PagerDuty's research shows the average on-call engineer receives roughly 50 alerts per week, but only 2 to 5 percent require human intervention. Across broader monitoring setups, teams can see over 2,000 alerts weekly with just 3% needing immediate action.
That means for every legitimate alert demanding your attention, there are 30 or more false positives, duplicates, or low-priority notifications diluting it.
| Metric | Typical Range |
|---|---|
| Alerts per on-call engineer per week | ~50 |
| Alerts requiring human action | 2--5% |
| Teams reporting alert fatigue as top concern | 70% |
| Organizations dealing with duplicate alerts | 63% |
| Organizations dealing with false-positive alerts | 60% |
When your signal-to-noise ratio is this bad, the rational human response is predictable: people start ignoring alerts.
The Boy Who Cried Wolf Effect
Research on alert response behavior shows something that should alarm every engineering leader. A study on clinical alert systems found that the likelihood of a person acting on an alert dropped 30% with each successive reminder. The psychology transfers directly to engineering: when most alerts are noise, your team trains itself to dismiss all of them.
This is not a character flaw. It is a well-documented cognitive adaptation. Your brain cannot sustain high vigilance indefinitely. When 95% of pages turn out to be nothing, the 5% that matter get lost in the noise.
The consequences are not theoretical. In 2022, Suffolk County, New York experienced a devastating ransomware attack. In the weeks leading up to the breach, the IT team was receiving hundreds of alerts daily. Months earlier, frustrated by the volume of unnecessary notifications, staff had redirected alerts to a Slack channel where they went largely unread. Attackers accessed, encrypted, and stole personally identifiable information. The county never paid the $2.5 million ransom, but spent over $25 million on remediation.
The Target breach of 2013 followed a similar pattern: security tools detected malicious activity early, but the alerts were buried beneath routine noise. Analysts dismissed them. By the time anyone acted, data on 40 million payment cards had been stolen.
These are security examples, but the same dynamics play out in infrastructure monitoring. When your team ignores a disk usage warning for the twelfth time because the threshold is set too low, they will also ignore it the one time the disk is actually about to fill up and take down your database.
The Hidden Costs No One Budgets For
Engineer Burnout and Turnover
On-call work takes a measurable toll on health and cognition. A systematic review published in Sleep Medicine Reviews found that on-call workers had significantly greater difficulty falling asleep and staying asleep, even when calls did not come --- the mere anticipation of being paged disrupted rest. A 2024 study in Postgraduate Medicine found that on-call burnout in physicians was directly associated with working memory impairment, depressive symptoms, and sleep disturbance. The same cognitive load applies to software engineers staring at dashboards at 2 AM.
The financial impact of burnout-driven attrition is severe. Replacing a software engineer costs between 30 and 70 percent of their annual salary when you factor in recruitment fees, interviewing time, onboarding, and the productivity ramp-up period. For a mid-level engineer earning $150,000, that is $45,000 to $105,000 per departure.
One widely cited case study describes an 8-person engineering team that lost 3 engineers in 6 months directly due to on-call burnout. At even the low end of replacement costs, that is $135,000 in turnover expense --- plus months of reduced capacity while new hires get up to speed.
For a 20-person team with 15% annual attrition driven partly by on-call dissatisfaction, you are looking at $135,000 to $315,000 per year in replacement costs alone. And that ignores the institutional knowledge walking out the door.
Slower MTTR and Missed Incidents
Alert fatigue directly inflates both mean time to detect (MTTD) and mean time to resolve (MTTR). When engineers are desensitized to pages, acknowledgment times creep up. When they are sleep-deprived from false alarms, their troubleshooting is slower and less effective.
Research on cognitive functioning in burnout shows impairments in attention, memory, and executive functioning --- exactly the skills you need for incident response. A burned-out engineer is not just unhappy; they are measurably less capable of diagnosing and fixing production issues quickly.
The cost of extended downtime is substantial. A 2025 ITIC and Calyptix joint study found that SMBs lose $25,000 or more per hour of downtime, while mid-sized organizations average $300,000 per hour. Even if alert fatigue adds just 15 minutes to your average incident response time across a year of incidents, the cumulative revenue impact adds up fast.
The Compounding Problem
These costs do not exist in isolation. They feed each other:
- Noisy alerts cause sleep disruption and burnout
- Burnout causes turnover, leaving fewer people on the rotation
- Fewer people means more frequent on-call shifts for those who remain
- More frequent shifts accelerate burnout in the remaining team
- Simultaneously, fatigued engineers miss real incidents, causing outages
- Outages create more pressure and more hastily-configured alerts
- More alerts create more noise, and the cycle continues
For small teams, this feedback loop can spiral quickly. A 10-person team losing 2 engineers to burnout suddenly has a 25% tighter on-call rotation, which accelerates burnout in the remaining 8.
How to Measure Your Own Alert Fatigue
Before you can fix the problem, you need to quantify it. Here are the metrics that matter:
Signal-to-Noise Ratio
Track what percentage of your alerts result in a meaningful human action (not just acknowledgment, but actual investigation or remediation). If less than 50% of your alerts require action, you have a noise problem. Best-in-class teams aim for 70% or higher.
Alert Acknowledgment Time Trends
If your median time-to-acknowledge is creeping up month over month, that is a leading indicator of fatigue. Engineers are not getting slower --- they are getting desensitized.
Alerts Per On-Call Shift
Count the total pages per rotation. Industry guidance suggests that more than one alert per on-call shift requiring wake-up is too many. If your engineers are getting paged multiple times per night, you are burning them out.
Post-Incident Alert Review Rate
After each incident, does your team review which alerts fired, which were useful, and which were noise? If you are not doing this, your alerting will only get worse over time.
| Health Indicator | Healthy | Warning | Critical |
|---|---|---|---|
| % of alerts requiring action | > 70% | 30--70% | < 30% |
| Pages per on-call night | 0--1 | 2--3 | 4+ |
| Acknowledgment time trend | Stable/decreasing | Slowly increasing | Rapidly increasing |
| On-call attrition mentions in exit interviews | Rare | Occasional | Frequent |
Fixing the Problem
Reducing alert fatigue is not about buying another tool and hoping it helps. It requires a systematic approach.
Consolidate Your Monitoring Stack
One of the biggest sources of duplicate and redundant alerts is running multiple disconnected monitoring tools. When your uptime monitor, your incident management platform, your on-call scheduler, and your status page are all separate products, the same underlying issue can trigger alerts from multiple systems simultaneously. An engineer gets paged by the monitoring tool, emailed by the incident tracker, and notified by the status page system --- all for the same event.
Consolidating onto a single platform that handles monitoring, incident management, on-call scheduling, and status pages eliminates an entire category of duplicate noise. This is one of the core design principles behind Alert24 --- not because consolidation is trendy, but because fragmented tooling is one of the root causes of alert fatigue in small and mid-size teams.
Build Proper Escalation Policies
Not every alert should wake someone up. Define clear severity tiers:
- Critical: Customer-facing outage, data loss risk. Pages immediately, escalates if not acknowledged in 5 minutes.
- Warning: Degraded performance, approaching thresholds. Notifies via Slack or email during business hours. Escalates to page only if unacknowledged for 30 minutes.
- Informational: Logged for review. Never pages anyone.
The key is that escalation policies should match the actual impact, not the theoretical worst case. A disk at 80% utilization is not an emergency if it has been growing at 1% per month.
Implement Quiet Hours with Critical Bypass
Engineers need protected sleep. Configure quiet hours that suppress non-critical alerts during off-hours, while still allowing genuinely critical pages to break through. Alert24's quiet hours feature does exactly this --- letting teams set windows where only alerts meeting a critical threshold will page the on-call engineer, while everything else queues for morning review.
Run Post-Incident Alert Reviews
After every significant incident, review your alerting:
- Which alerts fired? Were they useful?
- Did the right person get paged?
- Were there alerts that should have fired but did not?
- Were there duplicate or redundant notifications?
This practice is the single most effective way to improve signal-to-noise ratio over time. Alert24's post-mortem workflows include alert timeline review as a standard step, making it easy to identify and eliminate noisy alerts after each incident.
Set Alert Budgets
Some organizations set explicit per-team alert budgets: a maximum number of pages per on-call shift. When the budget is exceeded, the team is required to spend time the following sprint tuning their alerting. This creates organizational accountability for alert quality.
The Path Forward
Alert fatigue is not an inevitable cost of running software. It is a symptom of monitoring systems that were configured once and never refined, of tooling sprawl that creates duplicate notifications, and of escalation policies that treat every anomaly as an emergency.
The teams that solve this problem share a few traits: they measure their alert quality, they consolidate their tooling to reduce duplicate noise, they build escalation policies that match actual severity, and they treat alert tuning as ongoing maintenance rather than a one-time setup.
The engineering time your team spends investigating false positives, the sleep your on-call engineers lose to unnecessary pages, the institutional knowledge that leaves when burned-out engineers quit --- these are real costs with real dollar figures attached. For a 20-person team, the total annual cost of unmanaged alert fatigue can easily reach $200,000 to $500,000 when you combine wasted engineering hours, turnover costs, and extended incident resolution times.
The fix starts with acknowledging that alert fatigue is a systems problem, not a people problem. Your engineers are not lazy for ignoring alerts. Your alerting is broken for sending them noise. Fix the system, and the people will respond.
