Virtualization Technology News and Information
The State of AIOps: Understanding the Difference Between Tools


By Trent Fitz

How do AIOps tools leverage artificial intelligence?

The short answer is - they don't. No commercial AIOps solutions available today are truly using artificial intelligence. This may sound controversial, but it really isn't.

When industry analyst firm Gartner coined the term AIOps only a few years ago, it stood for "algorithmic IT operations." But because AI was in the acronym, confusion ensued, and, eventually, Gartner changed their usage of the phrase to be "artificial intelligence for IT operations." So, originally, the technology was not about artificial intelligence at all, and none of the solutions incorporated artificial intelligence. And that remains true today.

Many AIOps solutions leverage machine learning (ML), but ML should not be used interchangeably with AI (there are plenty of blogs written about the differences). Some AIOps solutions don't even leverage ML - they're simply doing analytics on large data sets. It's easy for techies to get sucked into which ML algorithms a vendor's solution is using and for which use cases. But the reality is that customer judgments should just be based on outcomes (e.g., Is this solution solving the problems for a particular environment?).

What's the difference between first-generation and second-generation AIOps tools?

There is one dramatic difference: the types of data being collected and analyzed.

At a high level, AIOps tools do two things - they collect data and they analyze data - in the interest of accelerating problem resolution in IT operations. First-generation AIOps tools do this by performing real-time analysis on mass quantities of event data and inferring probable root causes based on data analyzed from previous issues. Traditional AIOps tools are not service-centric and have no concept of topology. They rely solely on processed event data and suffer from blind spots that come from having no visibility into other data types, e.g., metrics, dependency data, streaming data, logs and other types of machine data.

Second-generation AIOps tools are beginning to emerge, and the key difference is that these solutions collect more than just events. They collect some combination of events, metrics, logs, streaming data, dependency data and more. This means eliminating the No. 1 problem AIOps tools have experienced thus far, i.e., limited visibility and context due to the lack of cardinality in the data they're analyzing. It enables vendors to inform ML algorithms with explicit topology, which makes a vast difference in detecting and isolating issues with certainty. This provides unprecedented context and unprecedented acceleration of problem resolution.

Does AIOps replace monitoring?

Not at all. This really goes back to the ability to collect many different types of data and the ability to collect high-cardinality data.

Stand-alone (first-generation) AIOps tools were created to address a specific problem: too many monitoring tools. Most enterprises of reasonable size have (or did have) this issue. For various completely understandable reasons, medium and large enterprises typically have in excess of 30 monitoring tools. So, the premise of AIOps was to not deal with that problem, but to deal with the symptoms. The idea was to overlay a technology that could ingest events from all the monitoring tools, correlate them, and spit out some inferred insights.

While the idea makes sense on the surface, there are numerous challenges with this approach. One big one is that this approach relies completely on pattern matching. For this type of tool to precisely identify the root cause of an IT issue, it must have seen that exact issue with the exact same fingerprint (the "pattern") some undetermined number of times. In today's complex, dynamic environments, these issues rarely have the same fingerprint. So, what early adopters of stand-alone AIOps tools have learned is that they must endure countless disruptions/outages before the tools begin to provide real value.

Second-generation AIOps tools change the game on this. In these tools, unified monitoring tools can inform ML algorithms with topology. In other words, it's telling the algorithms exactly how various systems are connected and dependent upon each other to deliver an application or IT service - not leaving it for the algorithms to infer. This dramatically changes a tool's capacity for precisely pinpointing the root cause of an IT issue.

How do AIOps solutions handle high-cardinality data?

In short, stand-alone (first-generation) AIOps solutions do not have high-cardinality data. High cardinality generally refers to the number of series in a time series database. A time series is a labeled set of values paired with time stamps. This can be metrics like memory utilization, network port latency or available disk space. More modern solutions also incorporate tags, which provide the richest set of environment-specific data.

As stated previously, stand-alone AIOps solutions are simply collecting events. It's not time series data, thus, data cardinality isn't even a thing with these tools. Second-generation AIOps tools have merged the capabilities of high-cardinality monitoring with intelligent analytics, which exponentially improves the ability to isolate IT issues in modern environments.

What makes high-cardinality data different from data often observed in classic event-driven dashboards is the incredibly large number of dimensions and associated metadata stored for every metric, log, event, etc. For instance, modern apps are comprised of millions of containers and serverless functions strewn across multiple clouds, and each one of these application components may exist for days or less than a second. Stitching all of this information together while trying to find outliers is magnitudes more difficult than trying to isolate a rogue Java thread on a typical application server.

Enabling complete visibility in complex, modern environments at cloud scale is a nontrivial technical feat, and it requires handling substantial streams of high-cardinality data to provide precise insights.

What is the future of AIOps?

Most industry analysts agree that AIOps will be a key element of IT technology stacks for the foreseeable future. Many industry analysts also agree that the pioneers of AIOps (the first-generation, or stand-alone, AIOps tools) will have increasingly diminished value as event collection must be augmented with higher-cardinality data to make the solutions viable in complex, modern IT environments. Second-generation AIOps vendors like Zenoss are delivering a new level of intelligent analytics capabilities for all data types, including metrics, dependency data, events and streaming data, providing unprecedented context and unprecedented acceleration of problem resolution.

To learn more about the state of AIOps, download this Forrester report: Take the Mystery Out of AIOps.


About the Author

Trent Fitz 

Trent Fitz is a veteran technology professional with over 20 years of experience in the high-tech industry. Trent is a proven leader of product strategy and business development in cloud computing, virtualization, converged infrastructure and data security.

Published Tuesday, October 01, 2019 7:33 AM by David Marshall
Filed under: , ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<October 2019>