
By Trent Fitz
How do
AIOps tools leverage artificial intelligence?
The short answer is - they don't. No
commercial AIOps solutions available today are truly using artificial
intelligence. This may sound controversial, but it really isn't.
When industry analyst firm Gartner coined the
term AIOps only a few years ago, it stood for "algorithmic IT operations." But
because AI was in the acronym, confusion ensued, and, eventually, Gartner
changed their usage of the phrase to be "artificial intelligence for IT
operations." So, originally, the technology was not about artificial
intelligence at all, and none of the solutions incorporated artificial
intelligence. And that remains true today.
Many AIOps solutions leverage machine learning
(ML), but ML should not be used interchangeably with AI (there are plenty of
blogs written about the differences). Some AIOps solutions don't even leverage
ML - they're simply doing analytics on large data sets. It's easy for techies
to get sucked into which ML algorithms a vendor's solution is using and for
which use cases. But the reality is that customer judgments should just be
based on outcomes (e.g., Is this solution solving the problems for a particular
environment?).
What's
the difference between first-generation and second-generation AIOps tools?
There is one dramatic difference: the types of
data being collected and analyzed.
At a high level, AIOps tools do two things - they collect data
and they analyze data - in the interest of accelerating problem resolution in
IT operations. First-generation AIOps tools do this by performing real-time
analysis on mass quantities of event data and inferring probable root causes
based on data analyzed from previous issues. Traditional AIOps tools are not
service-centric and have no concept of topology. They rely solely on processed
event data and suffer from blind spots that come from having no visibility into
other data types, e.g., metrics, dependency data, streaming data, logs and
other types of machine data.
Second-generation AIOps tools are beginning to
emerge, and the key difference is that these solutions collect more than just
events. They collect some combination of events, metrics, logs, streaming data,
dependency data and more. This means eliminating the No. 1 problem AIOps tools
have experienced thus far, i.e., limited visibility and context due to the lack
of cardinality in the data they're analyzing. It enables vendors to inform ML
algorithms with explicit topology, which makes a vast difference in detecting
and isolating issues with certainty. This provides unprecedented context and
unprecedented acceleration of problem resolution.
Does
AIOps replace monitoring?
Not at all. This really goes back to the
ability to collect many different types of data and the ability to collect high-cardinality
data.
Stand-alone (first-generation) AIOps tools
were created to address a specific problem: too many monitoring tools. Most
enterprises of reasonable size have (or did have) this issue. For various
completely understandable reasons, medium and large enterprises typically have
in excess of 30 monitoring tools. So, the premise of AIOps was to not deal with
that problem, but to deal with the symptoms. The idea was to overlay a
technology that could ingest events from all the monitoring tools, correlate
them, and spit out some inferred insights.
While the idea makes sense on the surface,
there are numerous challenges with this approach. One big one is that this
approach relies completely on pattern matching. For this type of tool to
precisely identify the root cause of an IT issue, it must have seen that exact
issue with the exact same fingerprint (the "pattern") some undetermined number
of times. In today's complex, dynamic environments, these issues rarely have
the same fingerprint. So, what early adopters of stand-alone AIOps tools have
learned is that they must endure countless disruptions/outages before the tools
begin to provide real value.
Second-generation AIOps tools change the game
on this. In these tools, unified monitoring tools can inform ML
algorithms with topology. In other words, it's telling the algorithms exactly
how various systems are connected and dependent upon each other to deliver an
application or IT service - not leaving it for the algorithms to infer. This
dramatically changes a tool's capacity for precisely pinpointing the root cause
of an IT issue.
How do
AIOps solutions handle high-cardinality data?
In short, stand-alone (first-generation) AIOps
solutions do not have high-cardinality data. High cardinality generally refers
to the number of series in a time series database. A time series is a labeled
set of values paired with time stamps. This can be metrics like memory
utilization, network port latency or available disk space. More modern
solutions also incorporate tags, which provide the richest set of
environment-specific data.
As stated previously, stand-alone AIOps
solutions are simply collecting events. It's not time series data, thus, data
cardinality isn't even a thing with these tools. Second-generation AIOps tools
have merged the capabilities of high-cardinality monitoring with intelligent analytics,
which exponentially improves the ability to isolate IT
issues in modern environments.
What makes high-cardinality data different
from data often observed in classic event-driven dashboards is the incredibly
large number of dimensions and associated metadata stored for every metric,
log, event, etc. For instance, modern apps are comprised of millions of
containers and serverless functions strewn across multiple clouds, and each one
of these application components may exist for days or less than a second.
Stitching all of this information together while trying to find outliers is
magnitudes more difficult than trying to isolate a rogue Java thread on a
typical application server.
Enabling complete visibility in complex,
modern environments at cloud scale is a nontrivial technical feat, and it
requires handling substantial streams of high-cardinality data to provide
precise insights.
What is
the future of AIOps?
Most industry analysts agree that AIOps will
be a key element of IT technology stacks for the foreseeable future. Many
industry analysts also agree that the pioneers of AIOps (the first-generation,
or stand-alone, AIOps tools) will have increasingly diminished value as event collection
must be augmented with higher-cardinality data to make the solutions viable in
complex, modern IT environments. Second-generation AIOps vendors like Zenoss are
delivering a new level of intelligent analytics capabilities for all
data types, including metrics, dependency data, events and streaming data,
providing unprecedented context and unprecedented acceleration of problem
resolution.
To learn more about the state of AIOps,
download this Forrester report: Take the Mystery Out of AIOps.
##
About the Author
Trent Fitz is a veteran technology professional with over 20 years of experience in the high-tech industry. Trent is a proven leader of product strategy and business development in cloud computing, virtualization, converged infrastructure and data security.