Industry executives and experts share their predictions for 2023. Read them in this 15th annual VMblog.com series exclusive.
The Rise of "Application Observability"
By Tom Wilkie, VP of Technology at
Grafana Labs
We're all familiar with the term APM:
Application Performance Monitoring appeared in the 1990s when the first
solutions were offered by companies like Precise, Wily, Mercury Interactive,
and Quest. Then around 2008, modern architectures ushered in the need for the
next phase of APM that came from organizations like Dynatrace, AppDynamics, and
New Relic. In recent years, the term APM has been owned by a series of vendors
and skewed so much that many people don't even really understand what it means.
The fact is that the space has continued to
evolve with the needs of our rapidly changing tech industry. We have moved
beyond just APM and need to adopt a new vocabulary to accurately describe our
environments. Thus, in 2023, say hello to "Application
Observability." I can't take credit for the invention of this term, but I
certainly support its rise in popularity to better describe this evolving area
of observability.
Observability has traditionally been thought
of as three (arguably four) pillars, but this approach is
focused on the technology - and not the user or the challenges they're trying
to solve. Over the past few years, this approach has been thoroughly debunked.
We've started to talk about how people use these tools and the classes of
problems they solve with them. An example here is Infrastructure Observability,
where Ops folks, SREs and DevOps practitioners use these tools to better
understand the behavior of our physical, virtual or software infrastructure -
the computers, memory, disks etc but also the databases, schedulers, queues
etc.
Application observability is what happens when
you consider observability through the lens of an application developer. Simply
put, it's the way of using these tools that helps you understand the behavior
of your application. Now this is, for all intents and purposes, still APM, but
"application observability" is a term that more accurately describes
what we really mean.
Observability came into our lexicon in recent
years, and admittedly, it's seen a lot of hype. In a way, observability has
been an evolution of how we think about monitoring. Every vendor, whether they
are coming at it from monitoring, APM, logs, traces, etc., is helping to build
a better way of monitoring modern software. Cindy Sridharan, who literally wrote the book on observability, once joked
that the observability nomenclature came about because developers didn't like
to do monitoring. While there may be some truth in that statement, it's
indisputable that the way we develop, monitor, and deploy our software and
internet infrastructure has completely changed.
Change is occuring due to four critical areas:
- Complexity: A decade ago, software was a lot
simpler. We built monoliths, and failure modes were known. But today we build
distributed systems and microservices, and the interactions between every
component of our stack can get really complicated because there are so many
interactions. In fact, there are so many of these interactions that it's hard
to even understand the numerous ways our applications can fail.
- Volume: The volume of the data collected
around our applications has exploded in recent years. Ten years ago, we might
have had an application deployed on a few servers. Those servers evolved into a
few dozen or a few hundred virtual machines, then a few thousand containers and
microservices. The complexity of software has made the volume of data go
through the roof.
- Variability: Servers used to sit statically in
a rack in the data center. You'd order new servers; they'd take weeks to be
shipped, deployed, racked and stacked, and installed; and then those servers
would sit around for years. But with the advent of things like containers,
Kubernetes, and now serverless, our infrastructure is truly elastic, and we're
doing dynamic load balancing and auto scaling. Infrastructure is coming and
going. So today our infrastructure is becoming extremely variable and more
elastic than ever before.
- Velocity: It used to be that we collected data
every 10 minutes, but today we're increasingly moving toward real time. Now
most organizations want to collect data multiple times per minute-every 10
seconds or even more frequently.
Together, these four factors forced a paradigm
shift in how we think about monitoring. The fact is that most failure modes are
no longer understood in advance. If we solely rely on checking the known
knowns, our monitoring will quickly fall short. Monitoring has become a data
analytics problem. The reality is that the advancements of our complex systems
have created the need to see into the realm of unknown unknowns - and this is
where observability shines.
Industry analysts have also noted the
connection between APM and observability, last year for the first time Gartner
added the word "observability" to the title of their APM Magic Quadrant and
expanded the definition of the space to include observability.
We're creatures of habit, and I expect the
term APM will stay in our vocabulary for quite some time. But we consider
"application observability" the better phrase to reflect many of the
modern environments and use cases users encounter today.
Currently, when we think of application
observability, it does not yet include the complete platform of all the things
that have been included in APM over the years. When one platform tries to do it
all, it often falls short in one way or another, and the APM platforms haven't
been immune to this. Compare that to something wildly popular like open source
technology, which has a brilliant underpinning but - let's just say it - a
slightly crappy experience, right?
There are also parallels within infrastructure
observability. This is where tools like Prometheus and Grafana are more
commonly used to help users understand the behavior of their infrastructure. I
like to think about it like this: "Infrastructure Observability" mostly manifests
in the use of logs and metrics, whereas "Application Observability" manifests
in the use of distributed tracing and continuous profiling. Different
techniques and technologies are needed for different jobs.
It's important to note that, despite this
being my 2023 prediction, this is nothing new. People have been using metrics,
logs, traces, and profiles to understand their application behaviors for ages.
But having a common language and set of terms to describe what we're doing
helps share knowledge and learning, and hopefully newer terms with more focused
meaning will help avoid misunderstanding. And not using terms which have been
gerrymandered by vendors is always a win.
And so, my prediction for 2023 is not just the
rise of the term "application observability" - but also steps toward improving
the user experience in the open source world.
##
ABOUT THE AUTHOR
Tom
Wilkie is VP of Technology at Grafana Labs, a member of the Prometheus team,
and one of the original authors of the Cortex and Loki projects. He serves as a
member of the CNCF Governing Board. In his spare time, he builds 3D printers
and makes craft beer.