SLOconf is the only event dedicated
to the practice and application of Service Level Objectives (SLOs). Taking
place May 15-18, SLOconf 2023 is a virtual event now in its third year. The
agenda will include more than 70 speakers with presentations laser-focused on
all aspects of SLOs.
In
this exclusive pre-show Q&A, we're speaking with Alayshia Knighten,
Ecosystems and Partnerships Engineer at Honeycomb, a leading observability
platform used by high-performing engineering teams to investigate the behavior
of cloud applications.
VMblog:
To kick things off, give VMblog readers a quick overview of the company.
Alayshia Knighten: Honeycomb provides
observability for high-performing engineering teams so they can quickly
understand what their code does in the hands of real users in unpredictable and
highly complex cloud environments. As an observability platform, Honeycomb is
vastly different from traditional Application Performance Monitoring (APM)
tools because it has the ability to sift through billions of rows of complete
telemetry data grouped by any arbitrary number of dimensions. This enables
faster debugging, higher uptime, better-performing services, more time for
innovation, and ultimately, happier developers and end users.
VMblog:
What made you sponsor SLOconf 2023? Is this a must-sponsor event for your
company?
Knighten: Honeycomb SLOs make it
possible to trigger alerts on issues that matter most to your business so you
can quickly debug them. They answer important questions like, "How much monthly
downtime is tolerable? What performance impact is acceptable before users are
negatively impacted? Should we focus on new features or tech debt?" Our actionable
SLOs help teams define, measure, validate, and adjust engineering priorities
collaboratively. SLO error budgets give teams the leeway needed to prioritize
or de-prioritize production issues
Honeycomb's focus
on understanding individual customer experiences is especially highlighted in
our approach to SLOs. Most tools use time series data to measure availability
but are limited to aggregating all customers into one measure: for the second
that just occurred, was the system "good" or was it "bad?" There may be
hundreds or thousands of customer experiences buried within those aggregate
time series measures that you just can't see or respond to.
Honeycomb's
event-based approach means that every individual service request is evaluated
against the service-level indicator (SLI) criteria you define. If even one
request failed, while thousands of other simultaneous requests succeeded,
you'll know about it.
VMblog:
What is your message to attendees of the show?
Knighten: Being on call doesn't have
to take over your personal life. You shouldn't need to sleep next to your pager
or be woken up by constant alerts. And avoiding these negative experiences is
so simple: implement SLOs that matter to your business. The freedom to explore
and receive alerts on things you actually care about is important.
It's also
essential to ensure you align SLOs to overall business objectives. Hence, all
stakeholders understand what engineering priorities are (and why!) and
engineering's impact on overall business goals.
VMblog:
What market needs or problems is your company solving for these
attendees?
Knighten: As organizations face
turbulent economic headwinds, engineering teams are expected to do more with
this. This unprecedented pressure to innovate and release new features faster
is compounded by stringent end-user expectations and increasingly complex tech
stacks. As a result, modern engineering teams can no longer rely on "good
enough" legacy APM tools built for predictable, monolithic systems that aren't
architected for today's complex and unpredictable distributed cloud
environments. Honeycomb is the only observability platform to entirely sidestep
the data correlation problem across logs, metrics, and traces, by uniquely
architecting its datastore to be datatype agnostic.
VMblog:
What sets you apart from the competition?
Knighten: Today, how code is written
often differs from reality. End users have varied environments and dynamic
software use cases, creating unpredictable bugs and anomalies. Honeycomb's
ability to quickly analyze high-cardinality data is crucial to discovering
novel problems. Honeycomb gives engineering teams the power to detect patterns
in seconds across billions of data points representing how users are
experiencing their code in real-time. It never aggregates or discards data.
We were the first
observability platform to launch fully executing Natural Language Querying
using generative AI for our new capability, Query Assistant. Query Assistant
enables developers at all levels to ask questions in plain English instead of a
query language. Generative AI then builds a relevant, modifiable query,
eliminating the prerequisite for advanced knowledge of query-based languages
like SQL.
Our
CTO Charity Majors often says that the best developer tools are the ones that
get out of your way and become invisible. Observability shouldn't require
engineers to master complicated tools or languages that force you to constantly
switch context and piece together clues to get answers. The only thing
observability tools should encourage you to focus on is your own curiosity
about what's happening in your system. This is where Honeycomb truly stands out
from the pack.
VMblog:
What are the trends your company is seeing that we should be aware of in 2023
and beyond?
Knighten: Developers at legacy
organizations are seeing the benefits that a modern approach to observability
offers them beyond the logs and metrics that they've had to use previously.
We're particularly excited about eBPF and the opportunities it brings to
support out-of-the-box auto-instrumentation while still providing the rich
context and flexibility users expect of observability tools.
OpenTelemetry is
rapidly gaining traction as the preferred way to instrument data for
observability. The interest from the developer community in creating an open
standard for telemetry has been on the rise for quite some time, and in 2023,
we see the possibility that OpenTelemetry will surpass Kubernetes as the
fastest and most important developing CNCF project. Vendors who continue to
push their own bespoke and proprietary instrumentation libraries and agents as
the default way to use their products will soon find themselves on the wrong
side of what consumers are demanding. In 2023, using OpenTelemetry to
instrument your applications for observability, regardless of the tools you're
using, will become the de facto standard.
VMblog:
Does your company have a speaking track at the event? If so, can you tell us
about the session so people can get them on their schedules?
Knighten: I'll be speaking about SLI
negotiation tactics for engineers. As engineers, we have our own Survival Level
Indicators (SLIs) that measure and define whether compliance with what happens
when the rockstar engineer who performs essential tasks A and B hasn't taken a
vacation in nine months. Over time, not meeting SLIs can take its toll on
engineers. How do we provide ourselves the opportunity for grace and the
ability to say "does this fit me as a person"?
In this session,
I will review different strategies to identify human burnout versus company
personal objectives. I will also talk about how we can improve ourselves and
survive in the high-risk climate in tech. The talk will give engineers and
managers the courage to care for themselves and their teams. Sometimes, it's
hard to identify when we-or our friends-are okay or not. In this discussion, we
will review how to identify "Houston, we have a problem" moments, ways to
improve our problems, and overall strategies for strengthening who we are.
My colleague
Jessica Kerr is also presenting how our use of SLOs has evolved over time here
at Honeycomb.
SLOs are part of
our product, so we've cared about them for a long time, and we put a lot of
conscious effort into how we use them (especially Fred Hebert, Staff SRE, who
is co-author and possibly co-presenter). As we change our internal best
practices, we regularly re-evaluate our SLOs, trading off alert fatigue against
customer experience. We also know how our customers use SLOs, so we know that
other companies could benefit from the kind of thought Honeycomb's SRE team
puts into this.
##