Virtualization Technology News and Information
Composable HPC is a Compelling Alternative for Public Sector


By George Wagner, Director of Product and Technical Marketing, Liqid

Multiple Contracts for Liqid Composable HPC, including one for ~$50 million for by the Department of Defense, is Signaling That Composable Disaggregated Infrastructure Has Arrived.

Artificial intelligence and machine learning (AI+ML) have changed high-performance computing (HPC) requirements, and today's fastest supercomputers reflecting that demand. It's no surprise that the problems they're being used to solve, coupled with their massive data processing needs, would necessitate HPC infrastructure. But now, flexibility and agility required for these workloads are outstripping conventional HPC deployments, which are fixed at the point of purchase. These systems are configured to power, conventional, predictable workloads.

AI+ML workloads, however, are notoriously uneven, requiring disparate resource arrangements based on the operation at hand. For example, data ingest is reliant on NVMe storage performance whereas inference tends is quite GPU-intensive. Furthermore, these data requirements can change in microseconds base on the workflow. Bottlenecks are injected into the process as massive amounts of data moves through each phases of the AI workflow.

While some improvements have been achieved at the software layer through virtualization and other software-defined solutions, at the end of the day they still consume a rigid server infrastructure, confined by tin. This has IT now investigating new infrastructure answers.

Composable Infrastructure Unlocks Accelerator and Storage Utilization

To more efficiently share and orchestrate powerful accelerator technologies such as GPU and FPGAs to meet the crushing demand of AI/ML workloads, IT is exploring new software-defined solutions like composable disaggregated infrastructure (CDI).

What is CDI? Well, it's is software that transforms your computing unit from the server to the datacenter. Imagine using software to compose bare metal servers on-demand to meet modern workload needs. Costly overprovisioning is eliminated because you only deploy only what a workload needs today, via UI, API or CLI. When it's time to scale resources up or down, do so in seconds, zero-touch, with utter disregard for whether there's room in the server. When a workload is retired, quickly move resources to a new or existing workload that needs them.

How is it done? You start by disaggregating resources like compute, GPU, Ultrastar® NVMeTM storage, Intel® OptaneTM memory, FPGAs and NICs into pools, and then connect them all via high-speed fabric like Ethernet and/or PCIe, then use Liqid Matrix software to compose them into real-time IT solutions to quickly address today's business AI challenges.

Why would you consider composability for AI/ML workloads? The amount of storage and GPU required will not likely fit in a single server. They demand the flexibility and shareability of SAN/NAS storage and the performance of DAS. Composability offers this, and NVMe-over fabric (NVMe-oFTM) is the answer for storage. You can feed the beast just like NVMe was in the box, but it's somewhere else in the datacenter. And need 20 GPUs on a single server over PCIe fabric, no sweat. Eliminate the bottlenecks associated with moving data between servers in the workflow, by leaving the data in-place and composing each server around it.

Where virtualization maximized resource utilization within a server, CDI ensures the server is right-sized, can scale as needed and eliminates over-provisioning. The architectural approach enables bare metal resources to be shared seamlessly across hardware in previously impossible configurations to more efficiently manage the data center footprint while efficiently addressing the data needs associated with AI+ML applications. 

Over time, CDI will become completely automated, with AI+ML answering calls for additional resources from the applications themselves. For now, Kubernetes, SLURM, and all commercially available hypervisors can manipulate bare metal via APIs as the application requires.

Department of Defense Deploys Composable Infrastructure for AI-intensive Applications

My company Liqid has recently been awarded two contracts from the Department of Defense (DoD) to provide three composable supercomputing systems that include more than 900 NVIDIA A100 Tensor Core GPUs and NVIDIA Mellanox HDR 200 gigabit per second InfiniBand smart networking.

The collective contracts are worth more than $50 million. At this time, the performance capabilities of the ERDC deployment would rank the system at No. 15 on the TOP500 ranking of the world's most powerful high-performance computing (HPC) platforms. The systems collectively represent 32 petaflops of performance and NVIDIA A100 GPU resources can be quickly added to or removed from compute systems for unprecedented flexibility and agility.

Ultra-fast A100 GPUs and Liqid composable NVMe storage can be aggregated and deployed via software without regard to physical limitations, and shared across intelligent fabrics in the exact ratios required for a given workload, at massive scale or down to the level of individual elements.

Though the ability to aggregate GPU and other accelerator resources was key to securing the contracts, for the DoD, the ability to reconfigure resources on demand and via APIs was just as significant. None of the industry-leading HPC providers could deliver that amount of flexibility was key for managing research and development activities across the globe.

Adoption of composable solutions is rapidly advancing, but there are still several approaches to composable infrastructure. Industry professionals are still working to understand the differences to determine what is right for them. It's important to understand your own organization's needs and consider the differences between the kinds of approaches different industry leaders are taking. Download this technical insight report from industry analyst firm The Evaluator Group that delineates the difference between the approach taken by HPC Synergy, versus the fabric-based approach developed by Liqid, and learn what technology's most prominent influencers have to say about composable infrastructure software.



George Wagner 

George Wagner is the Director of Product and Technical Marketing for Liqid. His career in information technology began in IT administration before transitioning into technical sales and ultimately product marketing. George has held roles at tech start-ups like LeftHand Networks and NexGen Storage, as well as traditional organizations such as Hewlett-Packard Enterprise. George uses his experience as an IT practitioner to communicate new solutions that address modern business and IT challenges.

Published Wednesday, February 10, 2021 7:35 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<February 2021>