E-commerce is changing the way consumers shop,
giving them millions of options to purchase goods online. This creates intense
competition for sellers, who must invest in solutions to deliver exceptional
customer experiences. Recently, Nobl9 conducted a digital customer experience
survey to understand how consumers interact with the applications they use. For
e-commerce organizations, the value of implementing Service Level Objectives
(SLOs) for reliability management stands out as an innovative technique to drive
customer loyalty by providing dependable shopping experiences. I sat down with
Dan Ruby, VP of Marketing at Nobl9 to understand the goals of the survey and
what the results tell us.
VMblog:
Let's start with the premise of the survey. Why did you feel a customer
experience survey was important for Nobl9, and what kinds of information did
you hope to reveal?
Dan Ruby: As a provider of SLO-based reliability
solutions, the Nobl9 team is always thinking about customer experience - for
both our own customers, and our customers' customers who expect high-quality
service. Even our website describes Nobl9's offering as, "Customer-Facing
Reliability, Powered by Service-Level Objectives." Effectively, in this survey
we wanted to know how reliable the average person views the sites and apps they
interact with on a daily basis, and how that reliability or lack thereof
impacted their views of the app, willingness to engage with the company, and
overall purchasing behavior.
We wanted to know, for example, how many
issues with a single application a user would tolerate before switching to an
alternative. We were also curious what customers tend to do when they
experience a glitch in an app - would they immediately delete it, simply
try again later, or maybe leave a negative review? Not only is this great data
for Nobl9 to have, it also reveals how crucial it is for all application
providers to focus on delivering seamless experiences, because our survey
results prove that customers really do care how reliable their tools and
services are. The hope is that this survey can help us communicate better with
our customers about their users'
behavior to show them why SLOs remain critical to business outcomes.
VMblog:
Can you explain the link between reliability and customer experience?
Ruby: Every application, every company, every
product owner should be thinking of reliability as customer experience. You can pack all the flash and features
into a digital experience you want, but at the end of the day, the experience part is driven as much by
what you can do with an app and whether
you can actually do it. Everyone has had bad experiences with an app - slow to
load, crashes a lot, loses data, rejects their perfectly correct password, and
so on. You downloaded the app because the sizzle of all the things it can do
piqued your interest, and then you deleted the app because trying to actually
do all of those things was actually quite difficult.
I think organizations understand this
perfectly well - per CIO's 2024 State of the CIO report, 22% of CEOs view
"Improve the customer experience" as one of the top 3 IT priorities, along with
improving IT and business collaboration. But improving the customer experience
requires understanding it, particularly from a reliability perspective. Keeping
things siloed and putting a requirement of nines of uptime doesn't give you the
insights you need to really make for a great experience.
In terms of customer satisfaction, key metrics
like a company's Net Promoter Score (NPS) are tied to reliability. So, too, are
more bottom-line business outcomes - latency, availability, micro-service
availability; really, any situation where there's a noticeable decrease in the
customer's reliability experience is simply another reason for them to not
complete whatever action they wanted to complete. This leads to higher rates of
cart abandonment and user churn, which in turn drive down metrics like lifetime
customer value.
One really tangible example of this is the
impact of page load times or latency on conversion rates. We know that even a
one-second delay loading a page can lead to 7% fewer conversions, and that
means revenue takes a hit. For a site generating $100,000 per day, this 7%
reduction translates to $2.5 million in lost sales annually.
With SLOs, you can even go beyond this.
Running SLOs on ratio metrics like conversion rate, cart abandonment rate, etc.
can help you identify when something is causing your business outcomes to
suffer. Digging into your product's SLOs when this happens can help you
identify what your SLIs should be - you may find that you're being too
stringent and wasting money on targets that don't actually negatively impact
customer experience. You should also be able to identify what the break point
of sorts is where a metric does have
that impact.
VMblog:
What were some of the significant results of your survey?
Ruby: We conducted this survey in April 2024 and
received over 300 responses, giving us confidence that this data is
statistically meaningful. The first key point is that customers are
experiencing quite a few issues with their applications. In fact, only 12% of
people did not have an issue in their daily use of apps in the past year. We
found that almost 60% of respondents experienced slow load times over the past
year, and worse, 60% also had an app crash completely. At the same time, over
40% experienced an app forcibly logging them out. These three issues are also
the ones customers found most frustrating, with app crashes in the top spot,
slow load times second, and forced log-outs in third place.
More interesting is how these customers
respond. Nobl9's survey found that when experiencing an issue, most people
would unsurprisingly close the app and try again later. This is not necessarily
a good thing for a company seeking to retain users - but the good news is that
almost 40% of people would either try the app from a different device, or check
the status page to see if the application as a whole was down. But one thing
that struck me is how few users actually took an action that the developers or
product owners could see - 10% reached out to the company, and 11% checked
social media to see if the app was down and potentially complained publicly.
Assuming the company is monitoring all social mentions, that leaves 79% of
frustrating, user-impacting app issues that are just never seen internally. How
can a dev or IT team be expected to fix issues that are invisible to
traditional monitoring and aren't brought to their attention by the users?
At a higher level, survey data shows that 40%
of users are unlikely or very unlikely to continue working with a company whose
applications do not work properly, so ensuring consistent performance is
critical. The result I think teams will be most interested in is how many
issues their customers will tolerate before giving up on the product
altogether. Our survey found that over
70% of users would abandon an app completely after just 1-5 issues
- 6% after just one issue.
This suggests that businesses have very slim
margins of error if they want customers to continue using their products. But
context here is important; our survey revealed that major outages are not the
main cause of concern for bottom lines. Actually, 53% of respondents would feel
less frustrated about experiencing reliability issues if they knew the
application had a major outage. In that case, at least they know more or less
why their experience was subpar, and that it's not just them feeling singled out
by issues.
This tells us that ongoing, smaller
reliability incidents are the bigger source of customer churn. These hiccups in
performance can drive customers away, often without the business's knowledge;
another key result from the survey is that customers are very hesitant to give
feedback. All respondents indicated they were unlikely to leave reviews on apps
whether they liked them or not.
VMblog:
As a provider of SLO solutions at Nobl9, what does this data tell you about
reliability practices?
Ruby: The main takeaway for us is that businesses
need to move beyond traditional reliability strategies; modernizing your
approach with SLO-based reliability is a necessity, not a nice-to-have. IT
departments have of course recognized the link between reliability and user
retention for a long time, turning to traditional reliability practices like
improving Mean Time To Recovery (MTTR) as a primary KPI, prioritizing the
number of nines of uptime their applications have, and reducing the number of
catastrophic outages. But MTTR is reactive, not proactive, and nines of uptime
has its own issues - most of these day-to-day reliability issues don't
necessarily constitute an outage. You can have all the nines of uptime, but if
micro-outages are affecting your customers' experience, they don't matter. I
doubt many people would hear "Hey, I know you're unhappy with the performance
of our app, but look, we have five nines of availability!" and suddenly change
their outlook on the app's performance.
Unlike traditional techniques, SLOs provide
you with a system to monitor errors, not outages. You can strategically dial in
your acceptable error budget and get alerts when your Service Level Indicators
(SLIs) are burning that error budget more quickly than expected.
This has a couple of benefits. For one, your
definition of "reliability" goes from a binary up/down metric to one that is
actually indicative of the day-in day-out customer experience. And two, by
setting your SLIs strategically based on the customer impact of errors, you can
dial in your IT spend by focusing on KPIs that impact customer outcomes.
In an ecommerce system, for example, the app
may be running, but the checkout process is slow to load and customers are
abandoning carts at a high rate. With a SLO tracking the checkout process's
latency, app and product owners can very quickly recognize a key issue that
hurts the customer experience. On the other hand, maybe your app's login
authentication service is throwing errors. But you know that the service is set
up to retry authentication automatically, taking a matter of milliseconds. This
means that a single error doesn't actually meaningfully impact the customer
experience, so you can set your SLO to allow for a bigger error budget here,
saving you some IT spend by focusing on the customer's perspective.
Organizations realize already that the day-in
day-out customer experience is key to making their users happy and retaining
them as customers. And there are a lot of tools out there that pull data around
customer experience. SLOs are the last step of modernizing a reliability
strategy - in the case of Nobl9, getting them set up becomes almost trivial.
Connect the existing data sources, run some analysis on historical data to help
inform your SLI/SLO parameters, group infrastructure and services into a project,
and start making reliability holistic and customer-centric. SLOs don't need to
be new instrumentation; the data is already there, but without viewing it
through the SLO lens, it's just fragmented, siloed raw data that is far from
actionable.
VMblog:
It seems that every organization could benefit from stronger reliability
management. Tell us why this is particularly important for e-commerce companies.
Ruby: Ecommerce companies are facing immense
competitive pressures that many smaller industries are not.
There were almost 27 million global ecommerce
sites as of 2023 - nearly 14 million in the United States alone. Amazon
and other massive sellers stand out, but even extremely specialized sellers are
up against stiff competition given just how many options are out there. It
really has never been more important for ecommerce companies to focus on site
reliability in order to gain new customers and keep existing ones.
There's not a ton of levers ecommerce
companies can pull to differentiate themselves. You can compete on price,
within the constraints of manufacturers' pricing guidelines. You can try to
differentiate on shipping speed and costs, or product availability and variety.
But to a certain extent, these are often kind of top-of-funnel elements.
Someone may launch your app or go to your site because a Google product search
shows your listing and price. Once they get there, reliability either becomes a
blocker to them completing a purchase, or they never notice it because
everything goes smoothly, and they buy from you.
An SLO approach is particularly powerful for
ecommerce companies because their applications are made up of a large
collection of services that must all work properly for the overarching app to
work. Even a simple app hosted on a public cloud platform might include
Kubernetes clusters to automate scaling; external services such as CAPTCHA for
logins, a payment gateway, and a CDN to host images and videos; and internal
microservices like an authentication server, a shopping cart, and a search
feature.
Traditional reliability practices tend to silo
these elements, with little insight into their mutual impact. One endpoint
monitoring tool might look at servers, while another tool pulls infrastructure
data, another monitors containers, and so on. Making strategic reliability
decisions with this toolset means that every part of the application is
necessarily held to the same or similar standards. Reducing outages is
important for ecommerce companies, but traditional reliability fails to account
for the nuanced performance of an app on a day-to-day basis.
Consider that failures in microservices
supporting an ecommerce site's various functions, like the all-important
shopping cart, are what generate incomplete transactions and dissatisfied
consumers. Long load times are especially problematic for ecommerce sites that
depend on ushering consumers from search through checkout without a hitch. Just
a few seconds of delay may cause users to abandon shopping carts; the average
ecommerce site loses half of its visitors if pages take longer than 3 seconds
to load. Frequent app crashes also frustrate users, as our survey shows, and
they become less engaged or look elsewhere if an app repeatedly goes down.
VMblog:
Is there anything else our readers should know? How will this survey inform
Nobl9's strategy moving forward?
Ruby: This survey, and all of our customer
data-gathering, is directly shaping the future of the Nobl9 platform. We just
released SLO Details
2.0, which is a significant upgrade to our
primary user interface. All of the new features were driven by user feedback,
like the new Overview section that lets users focus on the most important
metrics for their Primary Objective. Other new updates are coming soon that
will help customer-facing organizations feed even more types of metrics into
their SLOs, from disparate sources, so insights from anything touching the
customer can be leveraged from within Nobl9.
We will continue to center customer feedback
in all decision-making and will likely conduct more surveys like this in the
future to keep current data on hand as user needs evolve. I would encourage
everyone to join our frequent webinars and workshops,
which are great resources to improve your reliability management. And lastly,
because even with SLOs customer feedback is important - we have a running
"Suggestion Box" for folks to drop
comments, concerns, or anything they'd like to share.
##
About
Dan Ruby
Daniel Ruby is the VP of Marketing at Nobl9.
Ruby is a dynamic marketing executive with a focus on B2B marketing, and has
significant experience building teams and driving successful, data-driven
programs for a range of startups and mid-sized organizations. As the Director
of Online Marketing for Localytics, Ruby was the first marketing hire and
scaled his team to a full-fledged marketing department with domain specialists
focused on mobile apps. Ruby also has a background in journalism and spent
several years guest lecturing marketing courses at Bentley University, bringing
this dynamic skill set to his current role at Nobl9. Ruby holds a BA in
Broadcast Journalism from University of Missouri-Columbia and an MBA in
International Business from Brandeis University.