By Cheuk Lam, Enlin Xu, and David Blinn of Turbonomic
Why Setting CPU Limits Can Slow Response-Time
Today, the majority of enterprise organizations running
mission-critical applications on Kubernetes are doing so in multitenant environments.
These multitenant environments rely on the setting of limits to regulate the tenant
workloads or to use limits for charge backs. Some Devs will set CPU limits for
benchmark testing for their applications.
CPU throttling is the unintended consequence of this design.
Take a look at this example...
Figure 1: CPU with 25% utilization
In the above figure, the CPU utilization of a container is
only 25%, which makes it a natural candidate to resize down.
Figure 2: Huge spike on Response Time after Resize to ~50%
CPU utilization
But after we resize down the container (container CPU utilization
is now 50%, still not high), the response time quadrupled!!!
So what's going on here? CPU throttling occurs when you
configure a CPU limit on a container, which can invertedly slow your
applications response-time. Even if you have more than enough resources on your
underlying node, you container workload will still be throttled because it was
not configured properly. And the high response times are directly correlated to
periods of high CPU throttling, and this is exactly how Kubernetes was designed
to work.
To bring some color to this, imagine you set a CPU limit of
200ms and that limit is translated to a cgroup quota in the underlying Linux
system. The container is only able to use 20ms of CPU at a time because the
default enforcement period is only 100ms. If your task is longer than 20ms, you
will be throttled and it will take you 4x longer to complete the task.
Your applications performance will suffer due to the
increase in response time caused by throttling.
How Do You Avoid CPU Throttling in Kubernetes?
CPU throttling is a key application performance metric due
to the direct correlation between response-time and CPU throttling. This is
great news for you, as you can get this metric directly from Kubernetes and
OpenShift.
To ensure that your application response-times remain low,
and CPU doesn't get throttled, you need to first understand that when CPU
throttling is occurring you can't just look at CPU utilization. You need to
take all the analytics that go into application performance into account.
Turbonomic has built that analytics platform.
When determining container rightsizing actions Turbonomic is
able to analyze 4 dimensions.
-
CPU Limits
-
CPU Requests
-
Memory Limits
-
Memory Requests
Turbonomic is able to determine the CPU limits that will
mitigate the risk of throttling and allow your applications to perform
unincumbered. This is all through the power of adding CPU throttling as a
dimension for the platform to analyze and manage the tradeoffs that appear.
Once the dimension of CPU throttling is added, this will ensure low application
response-times. Check
out this video to see it in action.
On top of this, Turbonomic is generating actions to move
your pods and scale your clusters-as we all know, it's a full-stack
challenge.
Customers have the ability to see the KPIs and ask ‘which
one of my services is being throttled?' It also allows them to understand the
history of CPU throttling for each service-and remember that each service is
directly correlated to application response-time! As one customer said, "This
CPU Throttling has been plaguing us. What Turbo provides will save time and
performance."
The benefit of Turbonomic is our ability to quickly identify
and solve a consequence of a platform strategy rather than have the customer
redesign their multi-tenant platform strategy. Not only can Turbonomic monitor
CPU throttling metrics, but the platform can also automatically right size your
CPU limit and bring the throttling down to a manageable level.
Learn More About CPU Throttling!
If you are interested in learning more
about the Kubernetes community and the adverse impact of CPU throttling, check
out these articles:
This one by Dave Chiluk is one our
favorites: Unthrottled:
Fixing CPU Limits in the Cloud Not only does he offer
a nice illustration about throttling, but he also presents an interesting Linux
kernel bug related to throttling and fixed it. Elsewhere, Dimitri
Stiliadis from Palo Alto networks wrote a program to illustrate the impact of
CPU limits on application's latency. There has been a lot of debate in the
Kubernetes community around whether it is good or bad to use CPU limits
that even
Tim Hockin offered his guidelines.
##
To hear more
about cloud native topics, join the Cloud Native Computing Foundation and cloud native community at KubeCon+CloudNativeCon North America 2021 - October 11-15, 2021
ABOUT
THE AUTHORS
Enlin Xu, Director, Advanced
Engineering at Turbonomic
Enlin Xu is a proud graduate
of Columbia University and has been a software engineer in Turbonomic since
2011. Now he is the Director of Advanced Engineering and leads the application
of Turbonomic's analytics platform to Cloud Native technologies.
Cheuk Lam, Software Architect, Advanced Engineering at Turbonomic
Cheuk Lam is a software
engineer at Turbonomic. He studies cloud native technologies and develops
solutions to continuously optimize workload scaling and placement in multicloud
environments.
David Blinn, Software Architect, Advanced Engineering at Turbonomic
David Blinn is a software
engineer at Turbonomic. He works on solving application performance and system
scaling challenges with a focus on containerized environments.