Anomaly detection is a much-appreciated tool by data scientists. It aims to find data samples that do not conform to the regular distribution of the dataset to which they belong. Finding anomalous samples, also known as distribution outliers, provides valuable insight that often correlates with defects or errors in the data collection process (e.g., faulty or misconfigured equipment). This blog post demonstrates how we leverage neural networks to build a time-based anomaly detector for mobile network testing use cases.
Traditional multivariate anomaly detection methods use machine learning to learn data distribution from a large number of samples. If a new sample is outside the boundaries of this learned distribution, it will be considered an anomaly.
In the mobile network testing (MNT) scope, most of the collected data consists of the variable-length time series of key performance indicators (KPIs). Applying traditional anomaly detection methods on such data requires averaging those KPIs over time. However, this approach eliminates important information that is only visible in the sequence of time events and could be key to exposing anomalous behavior (e.g., signal glitches, uncommon trends). On that premise, we need an anomaly detection method that is capable of finding anomalies in the time domain.
Anomaly detection in the time domain
Neural networks were invented a long time ago, but they only excelled in the last decade when we could train larger networks, also known as deep learning. Today, most of the deep learning’s success comes from supervised learning. Here, the neural network learns a function that maps some input data (e.g., image pixels, text, etc.) to its associated label (e.g., image class, sentiment, etc.). For unsupervised learning, on the other hand, the neural network is trained to learn information that is hidden in the data without labels.
If we have samples with labels indicating anomalies, we can implement anomaly detection using supervised learning. However, as anomalies are rare by definition, labeling is especially time-consuming and rather ineffective for a standard classification task due to high-class imbalance. Consequently, unsupervised learning methods are more commonly used for anomaly detection; in particular, the use of an auto-encoder, which is a special case of the encoder-decoder architecture shown and described below:
The encoder-decoder architecture comprises an encoder block and a decoder block. The encoder block transforms the input data into a compressed representation, and the decoder block tries to reconstruct the input data from the compressed representation, usually called the latent space of the input data.
To minimize reconstruction error, we train the encoder-decoder architecture with vast amounts of data from our given domain. The encoded latent space must contain the necessary information to allow the decoder to produce something that resembles the input data as much as possible.
Time-based anomaly detection
Once the proper latent space is learned, we apply the model to a new sample and detect anomalies by measuring reconstruction error. If reconstruction error is high, the latent space has not learned much about the kind of sample we are trying to reconstruct, and we declare it an anomaly. If reconstruction error is within a given range, the sample is immediately recognized as normal.
“We have adapted our auto-encoder neural network architecture to detect anomalies in the time domain.”
To apply this method to the MNT use case, we need to adapt our auto-encoder neural network architecture to the type of data we are dealing with, namely variable-length time series. Recurrent Neural Networks (RNNs) based on the Long Short-Term Memory (LSTM) cells are optimized to handle generic data sequences, and thus can be used to implement both the encoder and decoder blocks we need.
Compared to traditional anomaly detection, our method offers many advantages, including:
- Anomaly detection in the time domain, otherwise masked by KPI averages.
- The pinpointing of the KPIs that contribute the most to the anomaly degree. This information is key to subsequent root cause analysis of the anomalous test sample.
- The identification of the timeframe that contributes the most to the anomaly degree at the KPI level or the test sample level.
The animation above shows the advantages of this anomaly detection method. Once we have detected an anomalous test, we can sort the KPIs based on their reconstruction error. By doing this, we can identify the KPIs that contribute the most to the anomalous test. Likewise, we can further analyze each KPI and point to the time steps contributing the most to the anomalous test.
Detecting anomalies in data transfers based on variable-length time series benefits mobile network operators by detecting deviations and identifying problematic areas instantaneously. The visualization of the feature in SmartAnalytics from Rohde & Schwarz enables users to quickly see which phases of a test deviate from the model. The overall effect of time-based anomaly detection is a more efficient methodology for drive tests and optimization.
If you want to learn more, please send an email to firstname.lastname@example.org.
Learn more about machine learning use cases in the telecom industry: