Rafay Systems announced the expansion of the
industry's only turnkey solution for operating Kubernetes clusters with GPU
support at scale by adding powerful new metrics and dashboards for deeper
visibility into GPU health and performance.
The Rafay Kubernetes Operations Platform (KOP) now
features a fully integrated GPU Resource Dashboard that visualizes critical GPU
metrics so developers and operations teams can seamlessly monitor, operate, and
improve performance for GPU-based container workloads - all from one unified platform.
Kubernetes has rapidly become the preferred
orchestration layer for enterprises that need the ability to provision and
operate GPU-enabled, AI and machine learning applications in the cloud and at
edge/remote locations.
According to 2022 Gartner
Emerging Technologies: Edge Technologies Offer Strong Area of Opportunity -
Adopter Survey Findings, "The primary objectives for respondent organizations
investing in and adopting edge technologies are to improve employees
productivity (41%) and automate business processes (39%). This aligns with
existing Gartner research (see Emerging Technologies: Use-Case Patterns in Edge
AI) that edge AI is being used to improve business processes, delivering
automation and productivity gains that translate into measurable ROI, such as
cost savings."
However, as enterprises rapidly increase the
number of AI and machine learning workloads, addressing several challenges such
as visibility and monitoring helps prevent significant delays in application
deployment and wasted costs associated with idle or underperforming GPUs in the
clusters.
For example, a factory that increasingly relies
upon real-time video detection applications powered by AI needs a standardized
approach for cross-functional teams to manage the IT infrastructure and
applications. The following challenges often result in operational fragility
and lack of repeatability that hinders productivity:
-
Flawed or overly
restrictive access and visibility for developers and operational personnel that
need GPU metrics on demand to tune and optimize GPU workloads.
-
The struggle of hiring
or training a team of experts and spending months to develop, operate and
maintain a customized monitoring infrastructure to scrape and centrally
aggregate GPU metrics.
-
The complexity of
developing and maintaining an integration with corporate single sign-on (SSO)
systems to provide role-based access to metrics and dashboards.
-
Accounting for the
organizations' GPU-enabled workloads that are developed and maintained by
external entities (e.g., partners and ISVs). These entities also need
visibility to GPU metrics to ensure the workloads are performing optimally.
Rafay KOP solves these challenges by providing
enterprises and trusted external entities with a zero touch experience for
automated and centralized aggregation of critical operational metrics for GPUs
for the entire fleet of Kubernetes clusters. Rafay's Zero-Trust Access Service
with SSO integration enables seamless role-based access to ensure only
authorized developers, external partners and operational personnel can gain
secure access and visibility into GPU metrics from the console.
"Rafay makes spinning up GPU-enabled Kubernetes
clusters incredibly simple. In just a few steps an enterprise's deep learning
and inference projects can be fully operational," explained Mohan Atreya, SVP
Product and Solutions at Rafay Systems. "Not only do we provide the fastest
path to powering environments for AI and machine learning applications, but the
combination of capabilities in Rafay KOP enables scalable edge/remote use cases
with support for zero-trust access, policy management, GPU monitoring and more
across an entire fleet of thousands of clusters."
The new GPU Resource Dashboard that streamlines
the orchestration of GPU-based container workloads has been fully integrated
into the Rafay KOP and teams can take advantage of many additional benefits of
the SaaS platform today including:
-
AI/ML Application
Deployment Automation: Rafay KOP allows
organizations to avoid spending months or years developing a custom platform
just to provision and manage GPU-enabled Kubernetes clusters for bare metal,
virtualized and cloud environments.
-
AI/ML Cluster and
Workload Standardization and Consistency: Rafay KOP's Cluster Blueprints standardize and govern
clusters and workload configurations across a fleet. Enterprises can detect, be
notified, and/or block configuration changes to Kubernetes clusters.
Unleash the power of AI and machine
learning applications at the edge with Rafay KOP: https://rafay.co/start/