Skip to main content
Back to Blog
Strategy7 min read04.07.2026Max Fey

Nobody reads your alerts anymore. Here's why that's dangerous.

A real outage sat unread for four days in a Slack channel full of success messages. Why too many alerts are worse than too few, and how to build monitoring so the one message that matters actually lands.

The channel was working perfectly. That was the problem.

A client of mine, an online retailer with about fifty staff, had a Slack channel just for their automations. Every run posted into it: started, order processed, stock sync complete. A few hundred messages a day, all green, all cheerful. It looked like control. It felt like control.

Then the stock sync broke. An item kept selling after it was gone, and the error message landed right on schedule, tucked between two hundred success notifications. And there it sat. Four days, until a customer complained that his paid order had been cancelled. Only then did someone scroll back through the channel and find the alert that had been sitting there since Tuesday, waiting for an audience that had stopped showing up months ago.

That's alert fatigue, and I find some version of it in nearly every setup I inherit. The monitoring wasn't broken. It fired exactly when it should have. Nobody was reading it, because nothing in that channel had demanded a response in weeks, and the human brain is very good at learning to ignore things that never matter.

Blaming the human misses the point

The first instinct is always to blame someone. Why didn't anyone look? Wrong question. Nobody watches a channel that's ninety-nine percent noise, because after two weeks their brain has correctly concluded that watching it isn't worth the effort. That's not laziness. It's a perfectly rational response to a broken signal.

Alert everything and you effectively alert nothing. Attention is a finite budget, and monitoring that spends it all on "everything's fine" has nothing left when something isn't. The retailer's channel wasn't too quiet. It was too loud. And the one signal that mattered drowned in the noise of the ones that didn't.

A log and an alert are not the same thing

The root of this is a confusion baked in at build time. A log and an alert are two different objects, and most setups treat them identically.

A log is for later. It records what happened so you can reconstruct a failure after the fact. A log is allowed to be verbose, it can capture every step, because it waits patiently until someone needs it. Nobody reads a log in real time.

An alert is for now. It interrupts a human and demands an action. An alert doesn't say "this happened," it says "do something." And that's the test every notification has to pass: what is the human supposed to do when this arrives? If the honest answer is "nothing" or "just be aware," it isn't an alert. It's a log in the wrong place, clogging the channel where the real alerts are supposed to live.

The retailer had been dumping two hundred logs a day into an alert channel. No wonder the one genuine alert never stood a chance.

Three rules we give every monitoring setup

We build notifications on three principles. They sound obvious until you see how rarely they're followed.

One: only something that needs action fires an alert. Success gets logged, not announced. An order clearing cleanly is the normal state, and the normal state has no business in a channel that asks for attention. If you genuinely need confirmation that a nightly job ran at all, build a heartbeat that posts exactly once each morning: everything ran last night. One message, not three hundred.

Two: separate severities and route them to different places. A critical failure that touches money, customers, or legal deadlines does not belong in the same channel as a harmless warning. Critical is allowed to be loud and should take a different path, an SMS, a phone call, its own channel that only ever pings when something's actually on fire. If that channel gets one message a month, everyone looks the instant it does. That's exactly the state you're after.

Three: aggregate instead of screaming one by one. If an integration goes down and four hundred records fail in a row, you want one message, not four hundred. "Stock sync has been failing since 14:20, 400 records affected so far" tells you everything that matters. Four hundred individual errors tell you the same thing while burying every other alert that arrives in the same hour.

The alert that shuts itself up

That third rule has a corollary people miss. A single failure must not flood a channel for hours. If the same error recurs every five minutes because a workflow retries on a schedule and keeps hitting the same dead endpoint, report it once and then go quiet until it's fixed or a set interval has passed.

Deduplication is the jargon, and without it you get the opposite of what you want. One outage drowns the channel in a hundred identical messages, and while everyone's annoyed and scrolling past, a second, unrelated failure slips in and vanishes into the stream of the first. Good alerting flags a problem once, clearly, then holds its tongue instead of tapping on the same open wound every sixty seconds.

What a usable alert actually contains

An alert that just says "error in workflow 3" is almost as useless as none at all. It doesn't trigger an action, it triggers a search. Someone has to log in, find the scenario, open the run, and reconstruct what was meant. That's ten minutes gone, on every single message.

A usable alert answers four questions before the human has to ask. What happened, in plain words. Where, meaning which process and which system. Since when and how often. And a direct link to the affected run, so the jump from alarm to cause is one click and not a scavenger hunt. "Invoice export to the accounting system failing since 09:15, 12 invoices affected, last error: timeout, view here." Somebody can read that on their phone at seven in the morning and know immediately whether they can finish their coffee first.

The simplest test for whether your monitoring is sick

There's an early warning sign, and it's easy to check. See whether anyone has muted the alert channel. When colleagues start swiping the notifications away, that's not a discipline problem, that's the diagnosis. They've quietly decided the channel delivers more noise than value, and they're right.

A muted alert channel is the most dangerous state of all, more dangerous than no monitoring at all, because it fakes safety. The messages keep flowing in, everything looks watched, the little green checkmark in your head stays ticked. Except nobody's looking. You've got monitoring that records but no longer warns. The retailer was exactly there, minus the official mute. His people hadn't muted the channel in Slack, they'd muted it in their heads, which comes to the same thing.

Say less, so the message counts again

When we rebuilt the retailer's setup, the daily message count dropped from several hundred to about two. Most days zero, plus the morning heartbeat. Critical failures now go as a push to the operations lead's phone, everything else collects in a log nobody opens unless they're actually looking for something.

The point wasn't that less went wrong. Exactly as much went wrong as before. The difference is that a message in the alert channel means something again. When the phone buzzes, the operations lead looks, because in the last two weeks it hasn't buzzed once without cause. That trust in the signal is the actual goal of monitoring, and you can only earn it by being stingy with what you send.

So the question in front of every notification you're about to build isn't "might someone want to know this?" Almost anything might be something someone wants to know, and that's the trap. The question is: does someone need to do something the moment this arrives? If not, it goes in the log. An alert channel you can safely ignore for hours isn't a good alert channel. It's a channel nobody believes anymore.

If you suspect your own alerts stopped getting read a while ago, or you're just not sure a real outage would reach anyone at all, we're happy to look at it with you. Our free automation check goes through how your automations are monitored and whether the one message that matters would actually land when it counts.

#Alert-Fatigue#Monitoring Automatisierung#Alarmmüdigkeit#Fehlerbehandlung#Operations