While having a reliable monitoring solution for your application is important, being able to parametrize and configure thresholds and alerting is even more critical.
No matter what kind of market your business is in, your web applications have seasonal patterns. For example, the load of a conventional airline ticketing system fluctuates heavily over the hours of the day, the day of the week, the month, and eventually a particular day of the year. Think about the travel difference between New Year’s Eve and the day before Thanksgiving.
Your application may experience some normal fluctuations that are dynamic in nature, preventing you to define appropriate thresholds with a classical static threshold based solution. Additionally, some fluctuations can be very minimal or hidden in the vast amount of data your monitoring solution may be collecting. This information could be a good pseudo predictive indicator that something weird has been triggered but if you’re unable to see it, you’re surely not using it.
However, despite this criticality, defining convenient thresholds is not an easy task, sometimes even impossible. Proliferation of machine learning techniques leveraged by serious economic interests, like banking fraud prevention, has created numerous collection techniques to detect unusual data across any time series data.
From operation’s perspective, these techniques alone do not offer significant benefits as seasonality plays a critical role as well. There are monitoring solutions that can be configured to apply anomaly detection algorithms after you specify a seasonality pattern. However, this approach has some limitations:
- You are required to know the seasonality of all your applications
- Your application seasonality may change over the time, enforcing you to periodically check your “expected” seasonality still is valid.
It is usual while doing time series analysis to decompose any dataset into three components:
- Seasonal Component - includes the expected recurrent behavior of the time series
- Trend Component - includes the observed trend over time
- Remainder Component - includes specific usage not part of the trend or the periodic behavior
These components can be used for a variety of different purposes including predictive alerting, triggering a message every time the remainder component exceeds a fixed dimension, and capacity planning based on the trend component.
To illustrate all these concepts lets used the dataset of monthly totals of international airline passengers between the years 1949 to 1960:
As indicated earlier, a particular time series can be considered the aggregation of three different series that include the seasonality, the trend and of course the remainder components.
It is obvious that at any given point in time the actual data is equal to the sum of its three decomposed components:
data(t) = seasonal(t) + trend(t) + remainder(t)
Depending on the kind of analysis to be performed one of these components may perform better than the others. For example, the “remainder” component could have been used by the airline companies to detect lower than usual passenger volume:
The blue arrows are pointing to the anomalies in the remainder data that were hidden from the “data” time series chart at the top.
Time series analysis decomposition has been available for a long time, however to perform this analysis you need to manually infer the periodicity. Using our airline example, you would have to first identify the seasonality and then manually enter it into the algorithm.
Requiring this knowledge “a priori” has prevented most monitoring solutions to include this analysis in a completely unassisted way. At SmartBear we see the immense potential of time series analysis and are working to incorporate this in an automated way into AlertSite.
Rather than going into an exhaustive description of the process, here’s a simple glimpse of how this works.
The problem we want to address is to automatically discover the seasonality of any of the data gathered by our customer’s monitors. Fourier Analysis allows to transform any “periodic” dataset into an equivalent frequency representation. This is like what you see on an equalizer when playing your favorite music.
The main idea behind Fourier Analysis is to represent a function as a finite sum of essential frequencies, called harmonics. If the energy of all different harmonics is depicted, we get what is called a periodogram. Here’s a periodogram for our airline travel example:
There is a clear outstanding frequency flagged with the blue arrow, it stands out from the rest of them because despite not being the absolute biggest, it is the biggest across its neighbors. It is the third one ordered by amplitude and belongs to frequency: 0.083333333 Hz. This means our data has a strong seasonality every (1/0.083333333= 12) 12 samples. Knowing that our data has a sample every month, it is clear that our airline data has a strong 12 months seasonality pattern.
In summary, once seasonality is discovered the time series can be decomposed, and the variety of available functions are enormous, especially when analyzing the “remainder” component.
This is part one of my three-part series, stay tuned for the next installment where I’ll discuss how this “remainder” component can be used employing statistics techniques and machine learning to flag abnormalities automatically that can be used for predictive alerting, for example.
- Abraham Nevado, Director of Root Cause Development