By Omer Hamerman, Principal DevOps Engineer, Zesty.co
Can
you imagine software development in 2022 without Kubernetes? We can't!
Containers and Kubernetes are truly industry game-changers. Nevertheless, there
are still some significant drawbacks to using these technologies.
One
of the main frustrations with using containers is that the data stored within
it is ephemeral, and as a result, you cannot retrieve it at a later time. This
presents problems for non-trivial applications (such as databases) when they
run on Kubernetes or containers.
For
example, when a container crashes, it will restart it in a clean state, where
all the data is lost. Kubernetes Volume Abstraction helps to mitigate this
common challenge.
In
addition to that, persisting the data within a pod requires users to mount
volumes into pods that are supported with cloud storage volumes using plugins
provided by Kubernetes.
While
these and other solutions exist to ensure persistence, they do not provide
enough elasticity to withstand fluctuations in data.
This
article provides surefire solutions for data persistency while ensuring
elasticity in containerized environments with the aim of getting greater
consistency and improved performance for cloud storage.
Persistent Volumes for K8s
We
cannot talk about persistent storage in containerized environments without
first addressing Persistent Volumes.
Persistent
Volumes are abstraction layers that can be used to connect cloud storage
platforms with Kubernetes clusters, without affecting the pod's life cycle limitations. Because
they are unaffected by the container's tendency to move, these volumes are
significantly more flexible, enabling the user to specify attributes such as
size and performance needs. Most importantly for our purposes, the data within
them can exist for longer periods of time when compared to ephemeral container
storage.
Why use PVs?
Because
Kubernetes pod lifecycles are so unpredictable, PVs are one of the most widely
used solutions for enabling data persistence. This is critical for performing
tasks such as operating databases and any other situation in which data
preservation is essential.
Types of PVs
Kubernetes has a large
variety of Persistent Volumes to fit their user's specific needs. For cloud storage
volumes, they offer the following volume types:
AWS EBS: AWSElasticBlockStore
Azure File:
AzureFile
Azure Disk: AzureDisk
GCP: GCEPersistentDisk
VMware vSphere Virtual Volumes: VsphereVolume
For
all of these volume types, the StorageClass object must be specified in order
to start implementing the PV.
The Drawbacks of PVs for Persistent Storage
So
if PVs are detached from Kubernetes pods, enabling greater flexibility for
performance and data persistence, all our problems are solved, right?
Well,
not exactly. Unfortunately, PVs are not 100% reliable in all scenarios
requiring persistent storage, especially when it comes to scaling and
elasticity.
The
PV auto-scaler uses a Container Storage Interface for which AWS has certain
limitations when applied to EBS volumes. First and foremost, volumes can only
be extended once every six hours, which means you cannot scale on-demand with
only one volume. In order to do so, you'd need to launch another volume and
attach it to extend your filesystem. Still, once you extend a PV, you cannot
shrink it back down. As a result, if there is a temporary fluctuation in
demand, you will continue to pay the price from that peak, even once your needs
return to their normal state.
Most
concerningly, PVs are prone to fill up when there's a high level of data
ingestion of any sort. ETLs (Extract Transport Load), which are used to blend
data from multiple sources, is just one example of a use case that causes peaks
in ingested data. Machine learning pipelines and databases can also cause data
spikes which may cause PVs to fill up and applications to crash as a result.
Volume Autoscalers: The Next
Generation Solution
In
scenarios where spikes in capacity are common, PVs have a tendency to fill up,
which can cause applications to crash if they run out of disk space. To combat
this, we must employ new solutions that provide enough flexibility and
performance capabilities that satisfy the needs of even the most demanding use
cases, particularly those involving sudden influxes of data.
That's
where storage autoscalers come in.
A
storage autoscaler will be able to expand and shrink filesystems so they will
scale up with a sudden surge of data and scale back down once this data is no
longer needed in block storage.
For
Kubernetes use cases, such a technology can be set up with a
daemonset which ensures every node in
the cluster will have an "auto-scalable filesystem" running on it and
interacting with the cluster and the node it's running on.
How
can this work in the real world?
Say
you're using the storage autoscaler in a data analytics cluster. Every few
hours a new process is fired, causing the number of pods in a particular
workload to grow. In this scenario, there is always enough CPU and memory to
handle the ingested data (assuming the cluster is configured to grow with the
demand) but all these machines are using the one volume which was provisioned
in the original configuration.
After
a while, a significant percentage of the jobs start failing due to an "out of
disk" error. With further investigation, you find your customer's data is not
homogeneous. While some customers' data analysis needed only 100GB of disk
space, others required as much as 750GB of disk capacity.
While
it's possible to set the cluster disk size to 750GB, this volume size brings
minimal guarantee to the job's stability as the maximum disk size tends to
gradually increase with new customers over time.
A
storage autoscaler can solve this problem by dynamically managing disk sizes on
the fly. As a result, you can easily prevent out-of-disk errors, stand by your
SLA, and significantly cut your EBS costs. By using a storage autoscaler, you
can start with as little as 15 GB of disk space and the autoscaler will auto
adjust as your capacity needs grow. Meaning, you get exactly the storage you need, when you need it.
For
Kubernetes environments, this elasticity is truly a game-changer as it prevents
PVs from filling up due to traffic spikes, enabling data to stay persistent by
removing any limitation to capacity. But, the elasticity obviously goes both
ways, so volumes are "shrunk" back down once demand decreases.
Concluding Thoughts
Docker
and Kubernetes provided game-changing solutions for modern DevOps environments.
So too, a storage autoscaler that can dynamically adjust disk volumes based on
the application needs can provide the next generation of flexibility,
persistence, elasticity, and performance needed to take Kubernetes to the next
level.
Kubernetes
has enabled us to redefine many of the limitations we once had while building
applications. And now we have the opportunity to redefine Kubernetes storage
management so our applications can finally scale freely with zero limitations.
Be sure to watch this space!
##
***To learn more about containerized infrastructure and cloud native technologies, consider joining us at KubeCon + CloudNativeCon Europe 2022, May 16-20.
ABOUT THE AUTHOR
Omer
Hamerman, Principal DevOps Engineer
Omer
Hamerman is a Principal DevOps Engineer at Zesty.co. He loves learning about
new technologies and better approaches to DevOps. In his spare time, he writes
about his experiences on Medium and https://omerxx.com/.