Univa announced that it has extended
advanced NVIDIA GPU scheduling support in
Univa Grid Engine to include Arm-based
systems. This announcement coincides with NVIDIA's announcement of a
new CUDA on Arm Preview Toolkit that includes SDKs, drivers, and
optimized Deep Learning and HPC libraries enabling developers to take
advantage of NVIDIA GPUs on Arm hardware.
NVIDIA
GPUs have been the accelerator of choice in the HPC community for over a
decade, delivering trillions of floating-point operations per second
(FLOPS). NVIDIA GPUs accelerate over 600 applications today, and have
become critical to advanced HPC workloads in fields such as
computational chemistry, molecular dynamics, and deep learning
frameworks.
While GPUs offer outstanding performance and power-efficiency, managing
GPU workloads can be challenging. Modern HPC environments are frequently
powered by thousands of CPU cores along with thousands or even millions
of GPU-resident cores shared among many users and applications.
Efficient operation requires advanced workload management capabilities
such as CPU-GPU affinity, NUMA, and topology-aware scheduling (for
NVIDIA NVLink and NVSwitch multi-GPU interconnects), advanced container
support, and integrations with tools such as NVIDIA's Data Center GPU
Manager (DCGM).
"Univa is a pioneer in support for advanced GPU workloads," said Univa's
Chief Technology Officer Fritz Ferstl citing Univa's work with Japan's
AI Bridging Cloud Infrastructure (ABCI). "ABCI is a 550 AI-Petaflop,
top-ten supercomputer where Univa Grid Engine manages HPC and AI
workloads across over 4,000 NVIDIA V100 Tensor Core GPUs. What we learn
in leading-edge HPC drives innovation in Univa Grid Engine for the
benefit of our commercial customers, including those running NVIDIA DGX
systems. Customers realize better performance, improved utilization, and
systems that are easier to manage."
Simplifying the Management of GPU Workloads
Univa Grid Engine helps users simplify the management of GPU workloads.
Sites can boost performance by placing workloads optimally and improve
overall efficiency and productivity by reducing wait times and allowing
more jobs to run simultaneously without conflict.
Univa was among the first commercial HPC software providers to support
Arm-based systems, announcing Univa Grid Engine Arm support in 2013.
Since that time, Arm systems have made steady inroads breaking into the
Top 500 list of global supercomputers in 2018. Arm systems offer
compelling price-performance, are power-efficient, and are available
from multiple computer manufacturers.
"The availability of NVIDIA GPUs and software tools on Arm is an
important development for the HPC community," said Gary Tyreman, CEO of
Univa. "GPUs play a central role in HPC and AI applications. For Univa,
offering our advanced GPU workload management capabilities on Arm
systems makes sense. It provides our customers and partners with added
flexibility and new infrastructure choices."
"Deep integration with NVIDIA GPUs and support for NVIDIA NGC for HPC
and AI containers is an excellent solution for customers running GPU
applications locally or across their choice of clouds," says Duncan
Poole, Director of Platform Alliances at NVIDIA.
Product Availability
Advanced GPU-scheduling will be available immediately as part of Univa's
Arm-based Univa Grid Engine distribution. For more information, visit here.