Anyscale announced a collaboration with NVIDIA to further boost
the performance and efficiency of large language model (LLM) development
on
Ray and the
Anyscale Platform for production AI.
The
companies are integrating NVIDIA AI software into Anyscale's scalable
computing platforms, including Ray open source, the Anyscale Platform,
and Anyscale Endpoints, announced separately today.
The open-source integrations will bring NVIDIA software, including NVIDIA TensorRT-LLM, NVIDIA Triton Inference Server, and NVIDIA NeMo
to Ray to supercharge end-to-end AI development and deployment. Making
cutting-edge AI software available via open source democratizes access
and dramatically increases the audience of developers that can use this
integration.
For production AI, the companies will certify the NVIDIA AI Enterprise
software suite for the Anyscale Platform, bringing enterprise-grade
security, stability, and support to companies deploying AI. An
additional integration with Anyscale Endpoints will bring support for
the NVIDIA software to a greatly expanded pool of AI application
developers via easy-to-use application programming interfaces.
"Realizing
the incredible potential of generative AI requires computing platforms
that help developers iterate quickly and save costs when building and
tuning LLMs," said Robert Nishihara, CEO and co-founder of Anyscale.
"Our collaboration with NVIDIA will bring even more performance and
efficiency to Anyscale's portfolio so that developers everywhere create
LLMs and generative AI applications with unprecedented speed and
efficiency."
"LLMs are at the heart of today's generative AI
transformation, and the developers creating and customizing these models
require full-stack computing with efficient orchestration throughout
the AI life cycle," said Manuvir Das, vice president of Enterprise
Computing at NVIDIA. "The combination of NVIDIA AI and Anyscale unites
incredible performance with ease of use and the ability to scale rapidly
with success."
NVIDIA AI Acceleration Speeds End-to-End Anyscale Development
NVIDIA's
open-source and production software helps boost accelerated computing
performance and efficiency for generative AI development.
The integration delivers numerous benefits for customers and users:
-
NVIDIA
TensorRT-LLM automatically scales inference to run models in parallel
over multiple GPUs, which can provide up to 8X higher performance when
running on NVIDIA H100 Tensor Core GPUs,
compared to prior-generation GPUs. These capabilities will bring
further acceleration and efficiency to Ray, which ultimately results in
significant cost savings for at-scale LLM development.
-
NVIDIA
Triton Inference Server standardizes AI model deployment and execution
across every workload. It supports inference across cloud, data center,
edge, and embedded devices on GPUs, CPUs, and other processors,
maximizing performance and reducing end-to-end latency by running
multiple models concurrently to maximize GPU utilization and throughput
for LLMs. These capabilities will add more efficiency for developers
deploying AI in production on Ray and the Anyscale Platform.
-
NVIDIA
NeMo is an end-to-end, cloud-native framework for building,
customizing, and deploying generative AI models anywhere. It includes
training and inferencing frameworks, guardrailing toolkits, data
curation tools, and pretrained models, offering enterprises an easy,
cost-effective, and fast way to adopt generative AI. The integration of
NeMo with Ray and the Anyscale Platform will enable developers to
fine-tune and customize models with enterprise data, paving the way for
LLMs that understand the unique offerings of individual businesses.
-
Anyscale
Endpoints is a service that enables developers to integrate fast,
cost-efficient, and scalable LLMs into their applications using popular
LLM APIs. Endpoints can be tailored to specific use cases and fine-tuned
with additional content and context to serve users' specific needs
while ensuring the best combination of price and performance. Endpoints
is less than half the cost of comparable proprietary solutions for
general workloads and up to 10X less expensive for specific tasks.
Availability
NVIDIA
AI integrations with Anyscale are under development and expected to be
available in Q4. Practitioners interested in early access are encouraged
to apply here.