Virtualization Technology News and Information
Kubernetes and Accessing Computational Power for AI/ML: GPUs and High-Speed Networks

By Victor Agreda & Jeevan Joseph

Interest in large language models (LLMs) skyrocketed after the publication of the paper Attention Is All You Need in 2017, followed by the launch of ChatGPT in 2022. Similarly, after Kubernetes' introduction in 2014 as a container orchestrator to make development process easier, it exploded in popularity and coalesced a strong community around itself - and became the de facto container platform.

While Kubernetes works well to containerize general purpose workloads, it has also become desirable as an option for running specialized workloads like high performance computing (HPC) and AI. As the adoption of Kubernetes increased for web-based applications, more developers started using the container-based model for specialized applications and verticals such as telecommunications and HPC. However, these diverse applications presented new challenges because the mindset of pets versus cattle, championed by Kubernetes, emphasized portability over specialization.

The challenges of high-performance computing workloads

Unlike web applications, HPC workloads such as AI applications have unique requirements like the use of GPU clusters and low latency networking. AI workloads running on large GPU clusters demand optimal network connectivity, which is often bottlenecked by the latency between nodes. This is where technologies like remote direct memory access (RDMA) come into play, enabling zero-copy networking by allowing a capable network adapter to transfer data from the wire directly to application memory or vice versa. This eliminates the need to copy data between application memory and the data buffers in the operating system that bypass CPUs, caches, or context switches. However, RDMA requires specialized hardware. These types of special hardware resource dependencies makes RDMA pet-like when compared to the general Kubernetes model. But Kubernetes' key strength is its ability to extend itself in diverse ways, enabling these applications to run on it.

For their part, cloud service providers focused on providing managed Kubernetes platforms with advancements in the various cloud controller managers, as well as expanding on the infrastructure capabilities with clusters that can support an ever-growing number of nodes, as well as nodes with specialized hardware capabilities.

The introduction of device plugins opened the door for developers to access custom hardware in a Kubernetes-native manner. As a result, GPU device plugins made GPUs a schedulable resource on a node keeping track of the number of allocatable and available GPUs in every node. Developers can seamlessly request GPUs for their pods, just as they would request CPUs or memory, while offloading the pod placement and resource tracking to Kubernetes. Network resources work similarly. Modern cloud platforms provide single root I/O virtualization (SR-IOV) on their compute and GPU platforms, which essentially moves network virtualization to the networking hardware. When combined with RDMA-capable hardware and the appropriate device plugins, allocating RDMA devices to pods become as simple as allocating CPUs or memory. To make setup and management of these resources simpler, vendors like Nvidia have introduced GPU operators to streamline the process of installing GPU device drivers and device plugins on a Kubernetes cluster.

In a real world deployment of these capabilities, users can use the SR-IOV device plugin to slice the RDMA-capable network hardware into several virtual functions that can be allocated to dedicated pods, without loss of performance. 


HPC workloads are also different in their approach to workload scheduling when compared to web applications. This is traditionally an area dominated by specialized workload management tools like Slurm that specialize in HPC workload scheduling algorithms. Advanced batch scheduling algorithms were brought to Kubernetes with projects like Volcano, which addresses several challenges for distributed AI training with its support for gang scheduling, task queue management, task-topology, and GPU affinity scheduling.

Building a complete AI platform based on Kubernetes can still be challenging because in addition to the model training, a complete stack should also include processes and features for collecting and cleansing data, feature extraction from datasets, data verification, model management, model release, monitoring, and more. The Kubeflow project aims to create a cloud native platform for machine learning model development, training, release, and management.

These recent advances have enabled Kubernetes to run HPC-type workloads, and this is exactly the tooling necessary to build and run AI workloads. 

As Saurabh Baji, Senior Vice President of Engineering at Cohere puts it, "Kubernetes has significantly simplified the deployment of Large Language Models across platforms. It has allowed us to rapidly deliver a highly portable, cloud-native inference stack that works across diverse hardware and runs with improved efficiency for a variety of model sizes, at lowered cost for MLOps."

Building a large GPU cluster no longer means sacrificing performance for portability, which is a step forward in democratizing AI/ML and ushering in new innovations.

Pushing the envelope and keeping portability

What's next for Kubernetes in the AI and HPC space? The collaboration between the Kubernetes community, CNCF, and technology providers is poised to drive continuous advancements in Kubernetes. Expect rapid iterations in running larger models and achieving faster training on more extensive clusters. This cycle of growth holds the promise of opening up new applications and use cases for AI.

According to Mahesh Thiagarajan, executive vice president of software development at Oracle, "From AI driven application development and operations, to applying AI to an organization's enterprise data to drive business innovations, we are just getting started. Our immediate focus is on the dual problem of building highly scalable infrastructure though OCI that can train AI models that scale, and building customizable AI models that our customers can apply and train with their own data through platforms like Cohere."

As organizations increasingly adopt Kubernetes for AI workloads, the user experience for building a complete AI platform will continue to improve. While it currently demands in-depth knowledge, future Kubernetes distributions and cloud platforms may become more topology-aware, providing out-of-the-box integrations for specific workload types. This shift would empower developers to focus on higher-level problems, accelerating the progress of AI/ML innovations.


Join us at KubeCon + CloudNativeCon North America this November 6 - 9 in Chicago for more on Kubernetes and the cloud native ecosystem. 



Victor Agreda, DevRel Writer & Content Strategist at Oracle


Victor has worked as a Writer, Producer, Journalist, Storyteller, and all-around Multimedia Production Expert for nearly 20 years, leading content and strategy initiatives to build and engage large audiences. He has also been a teacher and entrepreneur, reveling in the exciting intersection of liberal arts and technology.

Jeevan Joseph, Senior Principal Product Manager at Oracle


Jeevan is a Senior Principal Product Manager for Containers and Kubernetes services at Oracle Cloud Infrastructure. He focuses on cloud-native application architecture, developer tooling, automation, web-scale performance engineering, and cross-product solution engineering. As a member of the Containers and Kubernetes product organization, he collaborates with strategic customers to enhance Oracle's platform based on their experience and feedback.

Published Thursday, October 26, 2023 7:36 AM by David Marshall
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<October 2023>