Cloudera announced additional support for key NVIDIA
technologies in public and private clouds to help enable customers to
efficiently build and deploy best-in-class applications for artificial
intelligence.
"GPU
acceleration applies to all phases of the AI application lifecycle - from data
pipelines for ingestion and curation, data preparation, model development and
tuning, to inference and model serving," said Priyank Patel, Vice President of
Product Management at Cloudera. "NVIDIA's leadership in AI computing perfectly
complements Cloudera's leadership in data management, offering customers a
complete solution to harness the power of GPUs across the entire AI
lifecycle."
This
new phase in Cloudera's technology collaboration with NVIDIA adds
multigenerational GPU capabilities for data engineering, machine learning and
artificial intelligence in both public and private clouds:
1. Accelerate AI and Machine
Learning Workloads in Cloudera on Public Cloud and On-Premises Using
NVIDIA GPUs
Cloudera Machine Learning (CML) is a leading service of Cloudera
Data Platform that empowers enterprises to create their own AI applications,
unlocking the potential of open-source Large Language Models (LLMs) by
utilizing their own proprietary data assets to create secure and contextually-accurate
responses.
The CML service now supports the cutting-edge NVIDIA H100 GPU in
public clouds and in data centers. This next-generation acceleration empowers
Cloudera's data platform, enabling faster insights and more efficient
generative AI workloads. This results in the ability to fine-tune models on
larger datasets and to host larger models in production. The enterprise-grade
security and governance of CML means businesses can leverage the power of
NVIDIA GPUs without compromising on data security.
2. Accelerate Data Pipelines with GPUs in Cloudera Private Cloud
Cloudera Data Engineering (CDE) is a data service that enables users
to build reliable and production-ready data pipelines from sensors, social
media, marketing, payment, HR, ERP, CRM or other systems on the open data
lakehouse with built-in security and governance, orchestrated with Apache
Airflow, an open source project for building pipelines in machine learning.
With NVIDIA Spark RAPIDS integration in CDE, extracting,
transforming, and loading (ETL) workloads can now be accelerated without the
need to refactor. Existing Spark ETL applications can seamlessly be
GPU-accelerated by a factor of 7x overall and up to 16x on
select queries compared to standard CPUs (based on
internal benchmarks). This allows customers of NVIDIA to take advantage of GPUs
in upstream data processing pipelines, increasing utilization of these GPUs and
demonstrating higher return on investment.
"Organizations are
looking to deploy a number of AI applications across a wide range of data
sets," said Jack Gold, President of J.Gold Associates. "By offering their
customers the ability to accelerate both machine learning and inference by
leveraging the power of latest generation of NVIDIA accelerators in cloud
and/or hybrid cloud instances, Cloudera is enabling users of both their data
lakehouse and data engineering tools to optimize time to market and train
models specific to their own data resources. This kind of capability is a key
differentiator for enterprises looking at making LLMs a mission critical part
of their solution set."
"We need to be
able to make accurate decisions at speed utilizing vast swathes of data. That
challenge is ever-evolving as data volumes and velocities continue to
increase," said Joe Ansaldi, IRS/Research Applied Analytics & Statistics
Division (RAAS)/Technical Branch Chief. "The Cloudera and NVIDIA integration
will empower us to use data-driven insights to power mission-critical use cases
such as fraud detection. We are currently implementing this integration and are
already seeing over 10 times speed improvements for our data engineering and data science
workflows."
Learn more about
Cloudera Machine Learning now supporting the NVIDIA H100 GPU and NVIDIA Spark RAPIDS integration in CDE.