Virtualization Technology News and Information
Stateful Workloads Managed by Kubernetes

State persistence is one of the major considerations when conceptualizing the design and architecture of an application. Stateless applications process information on the go, without the record or acknowledgment of previous requests or transactions. Stateful applications store data about previous transactions and return to this data for future requests.

Over time, containers have become a popular element of deployment for hosting applications as a collection of multiple, autonomous services. Stateful workloads present a challenge for containerization since the containers enforce resource isolation and use ephemeral storage, thereby leading to data loss every time a container restarts.

To help with this, Kubernetes utilizes volume abstractions to enable persistent storage for ephemeral containers. Through these abstractions, Kubernetes attaches PODs to physical storage devices that can persist state and data that can be used even after a container restarts.

Managing Stateful Workloads on Kubernetes

Along with Kubernetes' feature-rich orchestrating platform to manage containerized workloads, the platform is equally gaining popularity for operating stateful workloads. This is largely influenced due to its Container Storage Interface (CSI) that enables storage vendors to create effective data persistence solutions. The following section highlights how Kubernetes allows persistent storage for stateful workloads, and the requirements for running stateful applications in Kubernetes.

Kubernetes and Persistent Storage

Kubernetes introduced Volumes to enable the provisioning of portions of physical storage to be consumed by containers running in PODs. A Volume is a directory that is accessible to PODs so they can read and write data onto it. A PersistentVolume (PV), on the other hand, is a piece of storage, provisioned either statically or dynamically, to connect PODs to a particular file system. PVs have specific sizes and other identifying characteristics that determine which PODs can consume their storage. A PersistentVolumeClaim is a Kubernetes object that allows users to consume PV storage by requesting specific portions of the volume. The StorageClass resource enables the dynamic provisioning of PVs by offering different sizes, access modes, and other resource consumption options.

The Container Storage Interface (CSI)

As the need to persist data for stateful, containerized workloads increased, the Container Storage Interface (CSI) was developed to standardize storage orchestration for microservices. The CSI allows vendors of storage solutions to expose their products to applications running in PODs. Once a CSI plugin has been deployed to a Kubernetes cluster, it is available for use with any volume abstraction. The interface provides a simple way to orchestrate storage for containers created using any runtime, and extends the Kubernetes volume layer.

Considerations for Running Stateful Applications in Kubernetes

Kubernetes was initially built to run stateless containerized applications as it only had support for ephemeral volumes. With its growing popularity in microservices and cloud-native container orchestration, the need to support stateful workloads in its ecosystem arose. Running stateful applications on Kubernetes, however, requires additional work and complexity as it involves persisting state data on databases. To effectively run stateful applications in Kubernetes, an organization's infrastructure must factor in the following considerations:

Support for Multiple Storage Volumes

Kubernetes enables High Availability for containerized stateful applications through replication. Kubernetes PVs enforce shared storage between different containers, allowing different services in an application to use the same filesystem. This allows administrators to provide shared storage through different options including managed cloud solutions (AWS EBS, AzureBlob Storage, etc), open-source platforms (Ceph, OpenEBS, etc) or generic file systems (like NFS). The administrators should provision multiple volumes to enable backup and replication of persistent storage devices for stateful applications.

PV Availability Across POD Restarts and Termination

Persistent storage devices should be hosted on Storage Area Networks (SANs) with identities that survive restarts. This is because PODs are ephemeral and lose their data once they terminate or restart. Without available persistent storage, stateful workloads lose their data, affecting user experience and application functionality. As a result, the SANs IP Address and configurations should be preserved across application restarts and is recommended to be carried out with little user intervention.

Considerations for Statefulness in Kubernetes

Below are some factors to consider when evaluating a platform for running stateful workloads for Kubernetes: 

●      The Approach for Running Databases

When running stateful applications in Kubernetes, organizations can choose one of three options:

  1. Cloud-based Services - Organizations can opt for a cloud-based database from a third-party managed service provider. With these services, the administrators only need to focus on running Kubernetes applications and leave state persistence management to the service provider.
  2. External Database - In this case, operators run the database on an environment that supports stateful applications, but is hosted outside the Kubernetes cluster. This approach, however, is known to add operational overheads since administrators are tasked with managing high availability, volumes, scaling, and other administrative activities, making it a less popular option for large-scale databases.
  3. Statefulsets - A Kubernetes API object that is used to manage and maintain similar sets of PODs running stateful applications by enabling unique, persistent storage and network identities.

●      Disaster Recovery (DR) and High Availability (HA)

The platform should include secondary storage sites that allow for instant Disaster Recovery through automation. Administrators should always test and update Disaster Recovery plans to ensure the strategies are workable and enable seamless backup and HA management.

●      Service Level Agreement (SLA)

Storage service providers and clients should agree to set the performance of the storage platform for the application's workloads. Some key metrics that guide a Service Level Agreement for stateful storage include:

●      Uptime Percentage

●      Back-Off Requirements

●      Error Rate

●      Financial Credit

StatefulSets in Kubernetes

A StatefulSet is a Kubernetes API object that helps in the deployment and management of PODs based on identical container specifications. StatefulSets assign a unique and consistent ID to each POD, so that storage and the application can be attached to the PODs regardless of the node in which they are scheduled. This allows applications to maintain connections with their assigned PODs and PersistentVolumes even when the application is scaled up and down.

Features of a StatefulSet

A StatefulSet standardizes the ordering and identification of PODs, helping to manage the scaling and deployment of PODs. It is a controller that simplifies the management of stateful Kubernetes workloads, and has the following features:

●      Each POD in a StatefulSet has a stable, unique ID

●      A StatefulSet is a group of PODs with stable hostnames and persistent IDs

●      Kubernetes maintains PODs in a StatefulSet regardless of whether they are scheduled or not

●      The unique ID given to each POD is a combination of the StatefulSet's name and the POD's ordinal

●      The PODs in a StatefulSet are created sequentially, resulting in unique and ordered IDs

●      State information is stored on the Persistent Disk attached to the StatefulSet

Components of a StatefulSet

A StatefulSet is a Kubernetes API object whose configurations are outlined in a YAML file. Key components of a StatefulSet's configuration file include:

POD Selector

This field is set to specify the PODs that will be created within the StatefulSet. The .spec.selector field of the StatefulSet must match the .spec.template.metadata.labels field in the PODs configuration file. Lack of a POD selector in the configuration file results in a validation error.

POD Identity

This specification gives each POD within the StatefulSet a unique identity consisting of ordinal, stable storage, and a stable network ID. 

POD Management Policies

The .spec.podManagementPolicy specification in the YAML template allows administrators to relax the ordering guarantees while retaining identity and uniqueness guarantees. StatefulSets can have two main POD management policies:

●      OrderedReady Pod Management

●      Parallel Pod Management

Update Strategies

A StatefulSet can be configured to allow or disable automatic updates for Kubernetes resources that run the PODs it creates. Supported update strategies include:

●      On Delete

●      Rolling Updates

●      Partitions

●      Forced Rollback

Limitations of StatefulSets

A StatefulSet is a beta resource in Kubernetes, which means some operations still require manual tasks and configurations to execute perfectly. Some limitations of StatefulSets include:

●      To provision storage for a POD in a StatefulSet, the volume must be pre-provisioned by an administrator, or by a PV based on the StorageClass requested.

●      When a StatefulSet is deleted, the Volumes associated with it are not deleted. This may compromise cluster and user data if not addressed appropriately

●      Users/Administrators are responsible for creating the headless service that will assign and manage PODs' network IDs

●      Not all PODs in a StatefulSet are terminated when the StatefulSet is deleted

●      Some Rolling Updates may enter into a broken state that needs to be repaired manually if the OrderedReady POD management policy is used.

When to Use a StatefulSet

StatefulSets are particularly useful when applications require:

●      Stable, unique IDs

●      Stable, Persistent Storage

●      Ordered scaling and deployment

●      Automatic, ordered rolling updates


Kubernetes offers multiple approaches to persist storage for stateful applications. The Kubernetes Statefulset object enables the orchestration of stateful workloads by assigning unique, consistent IDs to PODs running applications. Using Volume abstractions (PVs, PVCs and StorageClasses), administrators can attach physical storage devices to containers in PODs. In addition to this, the Container Storage Interface (CSI) allows third-party vendors to create Kubernetes-centric storage tools that can be attached to containerized applications.


To hear more about cloud native topics, join the Cloud Native Computing Foundation and cloud native community at KubeCon+CloudNativeCon North America 2021 - October 11-15, 2021     


Murat Karslioglu, VP of Products & Solutions Engineering @ MayaData

Murat Karslioglu 

Murat Karslioglu is a serial entrepreneur, technologist, and startup advisor with over 15 years of experience in storage, distributed systems, and enterprise hardware development. Before joining MayaData, Murat worked at Hewlett Packard Enterprise / 3PAR Storage in various advanced development projects including storage file stack performance optimization and the storage management stack for HPE's Hyper-converged solution.

Published Monday, September 20, 2021 11:01 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<September 2021>