Virtualization Technology News and Information
Excelero 2020 Predictions: GPU Storage Bottlenecks Die, as the AI Transformation Continues

VMblog Predictions 2020 

Industry executives and experts share their predictions for 2020.  Read them in this 12th annual series exclusive.

By Josh Goldenhar, VP of products, Excelero

GPU Storage Bottlenecks Die, as the AI Transformation Continues

Scale-out data center technology providers like Excelero are both seeing and driving disruptive technologies - to the point that we forget how disruptive they are.  

We take for granted that software-defined storage is displacing proprietary hardware. This was a point of contention not long ago. We know NVMe Flash SSDs and their 5-20x greater write performance than traditional Flash SSDs (and 35x greater performance than spinning disks) make them the go-to storage media for many if not most enterprise storage needs. Superfast networking speeds like 100Gb Ethernet and InfiniBand are the "new normal,", with 200Gb emerging and 400Gb right around the corner. Latency has been identified as a key issue in our transaction driven, online society, justifying the deployment of RDMA (RoCE) for latency-sensitive applications, and NVMe over TCP/IP adoption for the rest of enterprise workloads.

In 2020, we'll see continued, profound data center change driven by from two simple letters: AI. Hands down, AI was the most important trend across all enterprise computing this year. In 2019 the use of AI and ML-based systems use has exploded as technology evolutions made it far easier to capture, store and process data into insights. In 2020 AI will continue to transform every digital process and technology.

Follow-on changes driven by AI include:

1)      More widespread deployment of GPU-based systems that lower the cost of massive compute on datasets, making parallel processing faster and much more powerful.

2)      Expanded role of the edge, with greater use of new sensor technologies that capture images, pulse, temperature and heartrate - sensors that create huge quantities of small IO, and often require compute resources at the edge. These will require more, low latency, high bandwidth and highly available storage for AI applications to analyze - adding to already massive data volumes. Edge data centers are particularly challenging as they are constrained in regard to power, space and cooling; therefore requiring high capacity and performance density. We predict high demand for NVMe flash to power AI at the Edge.

3)      Next-gen storage options such as NVMe that defeat the "GPU storage bottleneck." NVIDIA's debut of NVIDIA Magnum IO, a suite of software including the new GPUDirect Storage, is proof of the value and importance of NVMe for AI.

As AI and high performance computing (HPC) datasets increase in size, Nvidia explains, the time spent loading data for a given application begins to place a strain on the total application's performance. I/O, the process of loading data from storage to GPUs for processing, has historically been controlled by the CPU. As computation shifts from slower CPUs to faster GPUs, I/O becomes more of a bottleneck to overall application performance. Look for other implementations that use innovations in storage to expedite low latency and high performance for AI applications.

4)   Improvements for better training of the machine learning sets that power AI. When data scientists are training machine learning models, they literally process hundreds of terabytes of data, and they need to do so in the shortest amount of time. Leveraging elastic NVMe with GPUs drastically increases the number of jobs that can be run, and enables data scientists to train more and better models with higher accuracy.

5)      The rise of AIOps will drive NVMe over Fabrics storage demand. Containers are being deployed at massive scale in enterprises - to the extent that human operators are no longer be able to manage these massive clusters. Gartner predicts that AIOps, the application of machine learning and data science to IT operations, will be implemented in 30% of large enterprises by 2023, to help enable Kubernetes to be deployed and managed at scale. As use of AIOps expands in 2020, we'll see more NVMe over Fabrics deployments with Kubernetes related applications that make use of pooled, redundant NVMe storage for container applications requiring persistent volumes, so that enterprises can obtain both local flash performance and container mobility at data center scale.

The new generation of AI and high performance computing (HPC) applications significantly raises the bar for shared file storage. In 2020 we'll see AI force a better answer to previous barriers to performance so that scale-out storage deployments achieve higher ROI, easier workflow management and faster time to results.


About the Author

Josh Goldenhar 

Josh Goldenhar is vp of products for Excelero, a software-defined block storage disruptor.
Published Friday, December 27, 2019 7:25 AM by David Marshall
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<December 2019>