By Chris Farrell, Observabilty Strategist at Instana
One of my favorite concepts made popular by the "Big Bang
Theory" is Schrodinger's Cat. I have an S-Cat PC sticker AND T-Shirt, each with
a different joke about Mr. Schrodinger's feline. One is "Wanted: Dead and
Alive," which gets a laugh from people that know what it is, and some extremely
serious double-takes and frowns from those that think I'm advocating catricide.
But as fun as the T-shirt is, it's the sticker that I want
to focus on today. It's the start (and end) of a great Schrodinger's Cat joke:
Schrodinger's Cat
walks into a bar...
... and doesn't.
So what does this hilarious little physics and philosophy joke
have to do with Cloud-Native Applications? I'm glad you asked - or at least
read this far. I think of Mr. S whenever someone wants to start a debate with
me about "Observability," especially comparing
"O" (as I like to call it) to "Monitoring" - or Monitoring's Irish twin, "Visibility."
It's the biggest non-debate debate in IT today. Now before you decide to love
me or hate me (read the full article, then you can hate me), let me first say
that I am an advocate for Observability - and all the hidden meanings that
someone that says they're strategy for delivering high performance is the Big
O!
No, what I'm talking about is the constant debate that
Observability advocates seem to want to have about monitoring and visibility,
especially as it pertains to something near and dear to my heart -managing
application performance and Application Performance Management (APM).
Here's my answer to the all-important "Observability vs.
Monitoring" debate.
If someone asks me "should I strive for visibility or
observability?" I say "Yes!"
When somebody asks "do I need monitoring or observability?"
I say Yes!"
And if anyone ever asks "should I use proprietary or open
source observability?" I say "Absolutely!"
Maybe one day, I can make a "Who's on First Script" to go
with these answers. In the meantime, let me explain why I don't think the
questions are the right questions to ask, while I also explain my "answers."
Let's back up to the early days of EAI applications, J2EE
and the rise of Web Application Servers. It's also the birth of APM - just
about 20 years ago. Even back then, the debate raged: "visibility" or "observability." It just wasn't widely
referred to as observability at the time. Also, there was only one
observability API, and while it was a community standard (ARM), it wasn't open
source. More sources of data appeared over time - even the J2EE servers got into
the business, delivering object-level timing and load metrics via another API.
So why were the APM solutions so successful that they spawned multiple
generations of ever-growing successful companies? In a word - Complexity.
And now you're thinking "Complexity in a 3-tiered monolith?
You must be joking."
Of course, I'm not joking. Yes, the Business Logic code within
a monolithic application doesn't have the obvious complex interactions of
microservice, multi-cloud and Cloud-Native applications. But these were
applications running mission critical processes and delivering transactional
requests to end users. Of course code execution could be as complex as you
wanted it, but in the world of Observability, we're concerned with interactive
complexity. For these applications, the complexity appeared at the layer behind
the app server, between and including the back-end legacy systems that the Java
application relied on for executing user requests.
There are two areas where complexity is introduced into a
monolithic J2EE application. The first is where the application uses App Server
(and JVM) resources, from simple I/O to graphic services. The other complex
area is the interface to back-end systems, whether directly called with APIs or
handled through specialty App Server (Java) connectors. These two areas are
beyond the reach of those APIs - making them invisible to the "Observability"
solutions of the day. While "easier" problems could be solved with simple
response time, the more difficult problems - the kind that landed banks on the
front page of The Wall Street Journal for failed online banking applications -
those problems required visibility into the complex areas, with an
understanding of how and why different back-end systems were called - thus even
on monoliths, the concept of CONTEXT is critical to understanding how
applications execute their requests.
Let's jump ahead - skipping past Service-Oriented
Architecture and getting to the heart of today's debate - how best to monitor
the performance of Cloud-Native applications (microservices, containers,
orchestration). For Cloud-Native applications, the architectural architecture
is obviously technical, spanning numerous service layers, a polyglot of
languages, even Nobody would argue that the layout, architecture and usage of
microservice applications are inherently complex. Just the communication from
one service to another can be a problem.
Analogous to the J2EE APIs (both native to the app servers
and coded in by developers), there are multiple methods for getting basic performance
data (and some NOT so basic data) about microservices. Many of the technologies
operating in cloud-native applications have performance APIs built into the
platform / infrastructure - from databases to message queues, there's a set of
basic data available to anyone that has the API.
That's in addition to the new new set of monitoring AND
TRACING APIs available to developers to insert observability into their
application code - Prometheus is an example of metric instrumentation, while
Jaeger, Zipkin and OpenTracing can help obtain distributed trace data. From
Observability tools to next-generation APM and log analysis solutions, this
data is a part of the ability to see performance. But as we learned twenty-odd
years ago, it's not JUST about being able to take measurements. To optimize
performance - AND SOLVE PERFORMANCE ISSUES - we have to be able to break down
the complexity and understand relationships in that part of the system.
The real question to ask for proper Cloud-native application
performance monitoring is where is the application complexity that can derail
performance, create outages and hide under the guise of "there's a problem, but
all lights are green" situations? Remember, in monoliths, that complexit lies
in the code-to-back-end layer. But in Cloud-Native environments, the answer
isn't just different - it's on a differenct scale. And that answer is - the
complexity is EVERYWHERE.
Lossely connected services can take any path for
transactions. Developers are optimizing their owned services by choosing
specific technologies, so the need to monitor, support and debug multiple
databases, messaging, even programming languages - has become a permanent need.
So with complexity everywhere, how can we break down any
walls caused by said complexity to understand where our bottlenecks are and
what's causing any application and/or service problems.
The answer is the same as with monoliths - you have to take
measurements and gather data with an understanding of the context to the actual
use calls and use cases. For Cloud-Native, that means being able to do a few
key items:
-
Discover changes (new services, deleted services
and service updates) in real time
-
Understand all the upstream and downstream
relationships (inter-dependencies) of each service
-
Correlate all the information at hand
(performance metrics, individual traces, profiles (if you have them) - and
include data from all sources (open source API, monitoring agents, traces,
profies, etc>
Only with all the data in hand - and in the analysis engine
(whatever that engine is) - can a Dev+Ops team begin to understand how their
applications are doing, and how to optimize performance, resource usage and
service levels.
##
***To learn more about containerized infrastructure and
cloud native technologies, consider joining us at KubeCon + CloudNativeCon NA Virtual, November 17-20.
About the Author
Chris Farrell, Observability and APM Strategist
Chris Farrell is a Technical Director and Observability
Strategist at Instana. He has over 25 years of experience in technology, from
development to sales and marketing. As Wily Technology's first Product
Management Director, Chris helped launch the APM industry about twenty years
ago. Since then, Chris has led Marketing or Product strategy for 5 other APM /
Systems Management ventures.
Chris's diverse experience runs the technology gamut,
from manufacturing engineering and development for IBM ThinkPad to managing
global sales and marketing teams.
Chris lives in Raleigh, North Carolina with his wife and
two Siamese cats. He enjoys both watching and playing basketball in his spare
time - USUALLY. He has a BSEE and MBA from Duke University.