Rafay Systems announced new platform advancements that help enterprises
and GPU cloud providers deliver developer-friendly consumption workflows for GPU
infrastructure. The new Rafay Platform capabilities include enterprise-grade
controls, SKU definition, customer-specific policy enforcement and granular
chargeback data. Enterprises investing in GPU-based infrastructure in data
centers can leverage the Rafay Platform to roll out feature-rich
enterprise-wide GPU clouds that developers and data scientists can consume on
demand - complete with workbenches for model training, fine-tuning and
inferencing. GPU cloud providers deploying GPUs for consumption by downstream
customers can leverage the Rafay Platform to operate a full-featured,
multi-tenant GPU PaaS that delivers both accelerated computing resources along
with AI and ML tooling for training, tuning and serving large language models
(LLMs).
GPU Investments Outpace Platform Team Bandwidth, Delaying AI
Projects and Increasing Costs
Demand
for accelerated computing infrastructure is at an all-time high. A majority of
enterprises and service providers are investing in GPU hardware to meet
generative AI application development demand. Whether they are buying hardware
and deploying it in a data center, or committing to long-term leases with GPU
cloud providers, there is urgency to provide developers and data scientists
with this expensive hardware. Unfortunately, building a platform to enable
self-service consumption of accelerated computing hardware and AI and ML
workbenches can be a one to two year project. As a result of these platform
development delays, expensive hardware is underutilized - nearly a third of
enterprises are utilizing less than 15% of GPU capacity.
"Our
work with customers across high-stakes industries over the last two quarters
has revealed that enterprises and GPU cloud providers are running into similar
challenges. Both are looking for ways to speed up the delivery of accelerated
computing hardware to developers and data scientists," said Haseeb Budhani, CEO
and co-founder of Rafay Systems. "The new Rafay Platform capabilities address
this need, helping enterprises and GPU cloud providers speed the delivery of a
PaaS experience in order to monetize their significant investments in
accelerated computing infrastructure."
Rafay Accelerates GPU Monetization With Standardized Platform
Building Blocks
With
Rafay, GPU cloud providers and enterprises can quickly launch production-ready
AI services. Platform teams can now deliver much-needed services to developers
and data scientists through a PaaS offering that enables self-service
consumption of compute as well as AI and ML workbenches for fast
experimentation and productization of AI-based applications.
Newly
added Rafay Platform capabilities include:
- Multi-tenancy
enforcement:Rafay implements robust multi-tenancy controls that allow GPU
cloud providers and enterprises to safely and securely deploy workloads from
multiple customers on the same infrastructure without the risk of lateral
escalation attacks. The Rafay Platform offers new controls to protect against
lateral escalation, including a Kubernetes admission controller that will
automatically wrap pods into isolated Kata containers, each of which operate
inside a microVM inside a virtual Kubernetes cluster. Additionally, the
platform also supports dynamic network policy definition, zero-trust access
management and role-based access control (RBAC). Collectively, these controls
ensure demonstrable isolation between tenants, allowing for better monetization
of expensive infrastructure.
- Programmatic SKUs:For both GPU cloud and
enterprise platform teams, Rafay allows programmatic definition of compute and
service profiles that can be offered to developers and data scientists as a
turnkey package, empowering them to focus on building generative AI apps
instead of worrying about the infrastructure. By enabling the dynamic
definition of self-service packages - programmatic SKUs - GPU cloud and
enterprise customers can better manage infrastructure consumption and ensure
high utilization based on customer needs.
With Rafay, customers can
programmatically package compute resources and AI applications to deliver
Small, Medium or Large offerings that end users can select based on their needs
and an associated price. For example, Small may be defined as a Jupyter Notebook
environment pre-set with a PyTorch environment that is tied to one NVIDIA H100
GPU and is priced at $3 per hour. Medium may be defined as a fine-tuning
workbench pre-configured with the Llama 3.1 model and tied to eight NVIDIA H100
GPUs, and priced at $20 per hour. This approach replaces hardcoded SKU
definition strategies with a solution that scales, helping GPU cloud providers
package their offerings to meet market needs, while giving enterprises control
over resource consumption.
- Purpose-built AI
workbenches:With
Rafay's service profile capabilities, platform teams can provide Rafay's native
fine-tuning and inferencing tools or third-party services, such as NVIDIA NIMs
and Run:AI, to create AI workbenches for developers and data scientists. These
workbenches come pre-configured with all necessary components to speed the
delivery of specialized environments for AI and ML workflows. Platform teams
can optionally attach these workbenches to SKUs for self-service consumption.
- Chargeback and
billing:Rafay provides detailed resource tracking and cost attribution
features to help GPU cloud providers and enterprises monitor consumption across
their user base. GPU cloud providers can leverage chargeback data to generate
billing information for customers. Enterprises can leverage chargeback data to
internally manage budgets and cost center attribution.
The
new platform capabilities are now generally available to customers in the Rafay
Platform.