By Jason
Haworth, CPO at Apica
What's the
hardest thing about monitoring your critical business workflows? The fact that,
quite often, key parts of a digital user journey aren't actually "yours." They
rely on a broad range of third-party data and services, all tightly interwoven
with your systems via Application Programming Interfaces (APIs).
APIs form the
glue that binds application journeys together, enabling complex transactions
that automatically query external data sets. These capabilities have become so
important, it's hard to imagine modern workflows without them. But there's a
cost: APIs leave those workflows more dependent on systems outside your
control. If an API becomes slow or unavailable, user experiences can degrade,
supply chains get disrupted, and critical business-to-business transactions
grind to a halt.
These risks
make it essential to monitor API health like any other business-critical digital
service. Unfortunately, that's easier said than done. Most current approaches
focus on your API front-end or the health of the network it uses. These are
important metrics, but plenty of scenarios can unfold where your own
environment looks fine, even as transactions slow or fail altogether for users.
Fortunately,
new techniques offer a much clearer picture. The secret: make sure you're
monitoring APIs from the outside-in, measuring user experience from your user's
point of view.
Getting a
Partial Picture
The biggest
benefit APIs bring to modern workflows, as well as their biggest risk, is the
ability to query external data sets and turn the results into useful
information or action. The benefit comes from automating even highly complex
transactions. The risk: because of that automation, even small API issues can
cascade into big problems. Like highway drivers slowing to gawk at an accident and
causing miles-long traffic jams, a 20% lag in API response can create much
longer delays in the systems depending on it.
To avoid
problems like this, you should monitor your APIs much like you'd monitor a
website, asking the same kinds of questions: How many queries per second can my
API accommodate? How many logins? How are my APIs performing in my most
important markets? Am I meeting my service-level agreements (SLAs) with
customers and partners?
Unfortunately,
while most digital businesses use sophisticated monitoring techniques to
understand their users' web experience, they typically don't extend those approaches
to APIs. That is, for web services, they'll deploy distributed agents to generate
synthetic transactions and measure performance from customers' point of view.
But for APIs, they mostly measure from the inside-out-and miss key parts of the
picture. Parts like:
- Performance
for long-pull objects: Most API front-ends cache query responses, enabling
them to pull commonly requested objects very quickly. But it's entirely
possible to have important customers and SLAs relying on less frequent transactions,
with objects that only get pulled once or twice a quarter. If you're mostly measuring
cached objects, do you really know the performance of your API, or just the
cache?
- Dynamic
data-handling: Even common transactions can get quite complex. For example,
querying an inventory list, finding a product at a specific location, and
completing a purchase requires APIs to access and act on dynamic data. There's
no way to assess that performance without simulating the full user journey.
- Multi-API
interactions: In modern workflows, it's common for APIs to interact with several
other APIs. If you're not capturing aggregate performance across all of them, you
can't understand real user experience.
- Performance
under load: APIs that seem fine in a vacuum can suddenly suffer big
problems under load. For example, during socket buildup and teardown, an
overstressed API can quickly burn through infrastructure resources-racking up
significant cloud hosting costs. If you're not load-testing with synthetic
traffic, you likely won't catch those problems until well after the fact.
A Smarter
Approach to API Monitoring
When it comes
to business-critical workflows, it's not enough to monitor the health of
internal systems. You need to understand how APIs perform for real-world users
and transactions. The only way to do it is to measure from your users' perspective,
continually simulating and stress-testing the full user journey. Ideally, your
API monitoring strategy should combine:
- Synthetic
monitoring to generate synthetic transactions that simulate real users
- Load-Testing
to iteratively stress-test APIs from multiple locations, with multiple query
types
- Scripting,
so you can capture API calls and perform complex series of if/then operations
based on what those queries return
With these
capabilities in place, your API monitoring strategy should test each element of
the user journey, as well as the end-to-end workflow. By capturing a more
holistic view of query processes across the transaction, you can better
understand real-world API performance and more quickly diagnose problems if
something goes wrong. Make sure you cover the following core areas:
-
Infrastructure: You likely already capture
granular details about infrastructure performance. By correlating these metrics
with synthetic monitoring and load-testing of queries to external data, you can
measure API health from multiple angles. Now, you can gauge performance across
cloud and Software-as-a-Service (SaaS) components, as well as transitory
network elements you don't own, such as transport provider infrastructure and Content
Delivery Networks (CDNs). You can now accurately baseline API health and even calculate
actual costs per transaction.
-
Front-end: API front-end monitoring
should check the same core elements as any web service front-end, including the
health of Domain Name Systems (DNS) and Secure Sockets Layer (SSL)/Transport
Layer Security (TLS) certificates. Make sure you're simulating long-pull object
queries, as well as API cache performance. And perform login checks to verify
accessibility, especially for journeys involving multi-factor authentication.
-
Middle-tier and database: Many businesses
have adopted storage optimization strategies like automatically moving colder
data to Glacier cloud storage. This can significantly reduce storage costs. But
if you're not synthetically testing performance across storage tiers, you don't
actually know whether data services that depend on APIs are actually meeting
SLAs.
-
End-to-end user journey: Work with your
application team to understand the full user journey, so you can make sure
you're instrumenting properly. You should be thoroughly testing your most
important journeys, including continually simulating (and then undoing) full
transactions.
APIs have
become far too important to rely on monitoring that only shows part of the
picture. By incorporating synthetic monitoring and load-testing into your
portfolio, you can capture a 360-degree view. And you can understand API
performance from the only perspective that matters: your users'.
##
ABOUT THE AUTHOR
Jason leads the product development and Solutions Engineering teams at Apica. As a seasoned technology leader, he is responsible for product improvement, innovation, and the lifecycle of the Apica solutions. He has more than 20 years of experience in performance technologies, where he leads the technical management, software development, security, network and systems administration and engineering functions. Through his work with cloud operators, telcos, and the Fortune 50, he has spent most of his career driving innovation in areas such as IoT, 5G, and Cloud Transformation.