Logging is vital to the success of any IT project. With solid logging practice, you can troubleshoot errors, find patterns, calculate statistics, and communicate information easily. With the size and complexity of modern systems, performing these actions involves various analysis activities.
One of these important analysis activities is anomaly detection. What is anomaly detection, and where does it fit in all of this? That’s what this post is about. I’ll first present a succinct definition of what anomaly detection in log file analysis is. I’ll then explain the definition in detail, before discussing why it’s important for your business and introducing how it works.
Anomaly detection in log file analysis is the practice of automatically analyzing log files to uncover abnormal entries and behavior.
There’s quite a bit of information squeezed into those 14 words above. Let’s break it down.
Whether you arrived here from the XpoLog Blog or from your favorite search engine, you’re probably aware of what logging is. But, just as a quick recap, logs provide a history of actions performed by systems and the results of those actions. Given the complexity of many systems and the fact that they’re always on with 24-7 availability, logs can rapidly become difficult to manage. This is especially true when logs from multiple systems are unified. As a result, it’s not feasible to manually process logs. This is where the automated analysis part of the definition comes in. Automated analysis, such as XPLG’s patented approach, uses advanced technologies, such as machine learning, to process log contents.
As for the abnormal entries and behavior part of the definition, there’s no point investing in advanced technologies to process log contents unless you’re looking for something in particular. However, that something, if it’s known, can be hard to define. And searching for it is a long and tedious task. If you think about most of the logs you’ve seen, there’s probably one feature that stands out: they’re repetitive. Most of the entries simply say an event occurred. That’s it. What we want to do with anomaly detection is report when things aren’t following the normal pattern. It means the automated analysis needs to look at individual lines and groups of entries to determine if they’re expected. This can help us proactively find concerns before they’re a problem and help troubleshoot errors when they arise.
Why Anomaly Detection Is Important
Imagine you walk into work one day to find that a system you manage has been running slowly. Your team updated a few features in the last release, but that was over a week ago. There’s no reason why anything should be different now. Maybe it’s an integration that’s causing problems. Or maybe the server has a hardware issue. Whatever the case, you’re going to have to take a look at the logs.
Now you have a choice. Which of the following do you choose?
- Go through the raw log file with Ctrl+F and some regex. Maybe you’ll modify that script you tried to make last time. It didn’t quite work, but you think it sped the process up.
- Or run a log analyzer to identify entries and behaviors that don’t look like they fit (anomaly detection).
Keep in mind that the log has been recording on average one message every two seconds for the past two weeks. That’s over 600,000 entries you’ll need to search.
The first method will take hours, if not days, of effort. You don’t even know what you’re looking for. Or where to look first. The problem might not even be in the error and warning messages. It might be hidden in success messages that are fired too quickly or out of order. No amount of regex will find that.
The second method may not find the problem immediately. But it’s going to give you a subset to work with—things that you can investigate further without having to dive into those 600,000+ entries manually.
In short, it’s not feasible to manually inspect log entries in modern systems. Therefore, anomaly detection in log file analysis is important because it forms part of the arsenal of automated log analysis. This saves hours of effort and increases the likelihood of finding the root cause of a problem. This leads to increases in uptime, reduction in errors, and improvements in system design. All of which are likely vitally important to your business.
How Anomaly Detection Works
If you came to this page via a web search, you may have seen the GitHub repos and research articles that present various anomaly detection algorithms. These generally consist of three components: entry parsing, feature extraction, and anomaly detection.
Here’s a bit more detail about each:
- Entry parsing moves through the logs to process the entries into a consistent structure, determining relationships between entries and creating a map of the structured events.
- Feature extraction then looks at the map of structured events to turn the gathered information into a series of attributes about entries and groups of entries, known as feature vectors.
- Anomaly detection algorithms are then applied to the feature vectors to determine whether an entry or group of entries is abnormal.
The process is obviously much more complex once you dig into each component. If you’re familiar with machine learning, there are a range of supervised and unsupervised models that follow this pattern. For those interested, commonly used supervised methods include logistic regression and decision trees, while unsupervised methods include clustering and principal components analysis.
One Drawback: The Learning Curve
As with everything in this world, the bad must balance the good. With that in mind, the main drawback of anomaly detection is the learning curve. Without a detailed understanding of the different algorithms, it’s hard to know which method best suits your logs. Moreover, once you’ve made a decision, you need to train and optimize the model for your specific scenario. This process then needs to be repeated for each new implementation. It’s a nontrivial, time-consuming exercise that can easily be done incorrectly or suboptimally.
Before you get discouraged though, this is where the real strength of XPLG lies. By combining your logging with various forms of automated in-depth analysis, including anomaly detection, XPLG is uniquely suited to flatten out the learning curve. The tool has a patented analysis approach wrapped in a user-friendly GUI, which simplifies the process and means you’re not having to implement and re-implement an ill-fitting solution.
When and Where Can Anomaly Detection Be Used?
Anywhere you have a log file you can use anomaly detection. It’s best suited to large, complex systems and when you have a unified logging practice. This can range from access logs to runtime, development, and security logs. Anomaly detection can help troubleshoot why processes are failing, identify if you have any security concerns, and perform a sanity check on your software.
Importantly, anomaly detection isn’t just a reactive technology. It’s proactive. That means you don’t have to wait for an error to occur before you look into the logs. The process can run on any log at any time, including regularly in the background. By opening up proactive analysis, you can find and solve concerns before they become problems.
We now know what anomaly detection in log file analysis is. We also know the high-level process taken by anomaly detection algorithms and that there are a range of available algorithms to perform the detection. I’ve also discussed why it’s important and how it can not only save time, but allow for troubleshooting that would not otherwise be possible.
Where to Go From Here
If you haven’t already, download the XpoLog7 and request a demo. Try running anomaly detection on some of your logs. Who knows, you might find something interesting that would have otherwise passed unnoticed.