False Alarms – Your Ops Nemesis

False Alarms – Your Ops Nemesis
Ryan Pinkham
  April 26, 2016

If you work in operations, or manage a monitoring or operations center, you understand the frustration that comes with false alarms.

IT analytics firm BigPanda recently conducted a 2016 State of Monitoring survey and found that 79% of IT professionals see reducing alert noise as their top challenge. Poorly managed NOCs and unnecessary alerts cost organizations hundreds of thousands in IT costs every year.

How can you reduce the number of false alarms for your organization? Let’s take a look.

Too many tools, too much data, too many alerts

The evolution of IT systems and processes has led to the case of Franken-monitoring.

IT teams monitor everything from customer facing applications to the server-level backend. They are at the center of all the data from all monitored endpoints. They are charged with making sense of the data, correlating it to understand the big picture, and updating various stakeholders when needed.

On top of this, IT professionals are also responsible to respond and react to the alerts they get from all these monitoring endpoints. Now imagine a situation where only a fraction of the alerts sent to IT experts are the real deal? In the same survey, BigPanda found that almost half of IT professionals receive over 50 alerts per day and almost a quarter receive more than 100.

When there is too much noise, it takes time and effort to identify the actual issue and take necessary actions to fix it. And that affects recovery and remediation activities. Of those who receive 100+ alerts per day, only 17% are able to investigate and remediate the majority of alerts within 24 hours. Meanwhile the business faces consequences of the downtime, in revenue impact, call surge to the call centers, social media backlash, and overall unpleasantness.

Who is at fault? Is the IT expert who is trying his/her best to solve the issue to blame? Or the tool that sent false alerts?

In tools we trust

Monitoring and alerting are two sides of the same coin.

Organizations use APM and monitoring tools to test reliability of their application, but who tests the reliability of the alerts sent by these tools? Even the most advanced monitoring tool is not useful to the business unless it alerts the right people at the right time when there is an issue or performance degradation. Organizations increasingly rely on operational intelligence tools to separate the signal from noise and engage subject matter experts when needed. However, separating signal from the noise is still left to human intuition.

In my data center engineering days, I have seen Ops guys using something as simple as an email keyword filter to sieve through the alert floods. Look at this detailed depiction of a day in a life of an IT person — his/her day starts with verifying the validity of alerts. Every alert — true or false — requires time and attention of the IT team. If you have a setup where an alert automatically creates support ticket, false alerts can mess up your support metrics and waste more time.

It’s about time the monitoring tools check for false positives and notify us only when our expertise is actually needed.

How we avoid false alarms

We at SmartBear take false alarms very seriously. That’s why we have been channeling R&D efforts for our AlertSite monitoring solution to improve on our already awesome alerting. We believe that application performance issues must be found before they impact the end users. We also believe that engaging the right people at the right time with the right information can save IT organizations both time and money.

Our synthetic monitoring tool proactively watches over applications and APIs from our global locations for availability, performance, and functional correctness. To avoid the curious case of false alarms, we repeat the checks before sending an error alert. We also give the option to validate the error from the same monitoring location and another location based on your choice. If the second check does not confirm the error, we report the warning state. This helps us avoid the infamous alert floods and help maintain the trust of our users.

When you get an alert from AlertSite, you know that your expertise is actually needed and it’s not a false alarm. All that time you save by not dealing with benign alarms and fire drills can be spent doing things that impact the bigger picture – data synthesis, IT innovation, and process improvement.

We empower IT and operations teams worldwide with tools that help them stay on top of their application health. Eliminating false alarms is one just one of those tool.

Want to learn more about how AlertSite can help you avoid your IT nightmares. Visit SmartBear’s Performance Monitoring Resource Center

You Might Also Like