Industry executives and experts share their predictions for 2020. Read them in this 12th annual VMblog.com series exclusive.
By Josh Goldenhar, VP of products, Excelero
GPU Storage Bottlenecks Die, as the AI Transformation Continues
Scale-out data center technology
providers like Excelero are both seeing and driving disruptive technologies
- to the point that we forget how disruptive they are.
We take for granted that
software-defined storage is displacing proprietary hardware. This was a point
of contention not long ago. We know NVMe Flash SSDs and their 5-20x greater write
performance than traditional Flash SSDs (and 35x greater performance than spinning disks) make them the go-to
storage media for many if not most enterprise storage needs. Superfast
networking speeds like 100Gb Ethernet and InfiniBand are the "new normal,",
with 200Gb emerging and 400Gb right around the corner. Latency has been
identified as a key issue in our transaction driven, online society, justifying
the deployment of RDMA (RoCE) for latency-sensitive applications, and NVMe over
TCP/IP adoption for the rest of enterprise workloads.
In
2020, we'll see continued, profound data center change driven by from two
simple letters: AI. Hands down, AI was the most important
trend across all enterprise computing this year. In 2019 the use of AI and
ML-based systems use has exploded as technology evolutions made it far easier
to capture, store and process data into insights. In 2020 AI will continue
to transform every digital process and technology.
Follow-on changes driven by
AI include:
1) More widespread deployment of
GPU-based systems
that lower the cost of massive compute on datasets, making parallel processing
faster and much more powerful.
2) Expanded role of the edge, with greater use of new sensor technologies that
capture images, pulse, temperature and heartrate - sensors that create huge
quantities of small IO, and often require compute resources at the edge. These
will require more, low latency, high bandwidth and highly available storage for
AI applications to analyze - adding to already massive data volumes. Edge data
centers are particularly challenging as they are constrained in regard to
power, space and cooling; therefore requiring high capacity and performance
density. We predict high demand for NVMe flash to power AI at the Edge.
3) Next-gen storage options such as NVMe
that defeat the "GPU storage bottleneck." NVIDIA's debut of NVIDIA Magnum IO, a suite of software including the new GPUDirect Storage, is proof of the value and importance
of NVMe for AI.
As AI and high performance computing (HPC) datasets
increase in size, Nvidia explains, the time spent loading data for a given
application begins to place a strain on the total application's performance. I/O,
the process of loading data from storage to GPUs for processing, has
historically been controlled by the CPU. As computation shifts from slower CPUs
to faster GPUs, I/O becomes more of a bottleneck to overall application
performance. Look for other implementations that use innovations in storage to
expedite low latency and high performance for AI applications.
4)
Improvements for better training of the
machine learning sets that power AI. When data scientists are training
machine learning models, they literally process hundreds of terabytes of data,
and they need to do so in the shortest amount of time. Leveraging elastic NVMe
with GPUs drastically increases the number of jobs that can be run, and enables
data scientists to train more and better models with higher accuracy.
5) The rise of AIOps will drive
NVMe over Fabrics storage demand. Containers are being deployed at massive scale in
enterprises - to the extent that human operators are no longer be able to
manage these massive clusters. Gartner predicts that AIOps, the application of machine
learning and data science to IT operations, will be implemented in 30% of large
enterprises by 2023, to help enable Kubernetes to be deployed and managed at scale.
As use of AIOps expands in
2020, we'll see more NVMe over Fabrics deployments with Kubernetes
related applications that make use of pooled, redundant NVMe storage for
container applications requiring persistent volumes, so that enterprises can
obtain both local flash performance and container mobility at data center
scale.
The new generation of AI and high performance computing (HPC) applications
significantly raises the bar for shared file storage. In 2020 we'll see AI
force a better answer to previous barriers to performance so that scale-out
storage deployments achieve higher ROI, easier workflow management and faster
time to results.
##
About the Author
Josh Goldenhar is vp
of products for Excelero, a software-defined
block storage disruptor.