Alluxio, the
developer of open source data orchestration software for large-scale workloads,
announced the immediate availability of version 2.6 of its Data
Orchestration Platform. This new release features an enhanced system
architecture enabling AI/ML platform teams using GPUs to accelerate their data
pipelines for business intelligence, applied machine learning and model
training.
"Enterprises seeking competitive advantage are
making greater use of machine learning and AI to derive insights from massive
datasets," said Haoyuan Li, Founder and CEO, Alluxio. "These datasets are often
distributed across hybrid cloud environments, making more consistent and
efficient data access critical to realizing the value from their AI/ML
initiatives."
"The success of machine learning depends on
accurate ML models, which in turn depend on lots of heterogeneous training
data," said Kevin Petrie, VP of Research at Eckerson Group. "This creates a
bottleneck unless you efficiently apply the right compute to the right data.
Alluxio aims to apply GPU compute to large datasets faster, which can help
speed data ingestion, data transformation, and model training."
In the latest release, Alluxio improves its
system architecture to best support AI/ML applications using the POSIX
interface. System performance is maximized by removing inter-process latency
overheads, which is critical for enabling full utilization of compute
resources. Aside from I/O performance, the end-to-end workflow of data
preprocessing, loading, training, and result writing is well supported by
Alluxio's data management capabilities.
"Machine learning applications benefit greatly
from the performance acceleration offered by GPUs. However, when utilizing
powerful compute hardware, the limiting factor of the workload often shifts to
I/O where workloads become bound on how fast data can be made available to the
GPUs as opposed to how fast the GPUs can do training computations," said Adit
Madan, Product Manager, Alluxio. "Alluxio 2.6 bridges this gap in
performance with a data orchestration layer for AI/ML workloads, allowing
applications to fully utilize expensive and powerful hardware without
encountering the data access and I/O bottlenecks."
Alluxio 2.6 Community
and Enterprise Edition features new capabilities, including:
Faster Data Access with a Large Number of Small
Files
Alluxio 2.6 unifies the Alluxio worker and FUSE
process. By coupling the two, significant performance improvements are achieved
due to reductions in inter-process communication. This is especially evident in
AI/ML workloads where file sizes are small and RPC overheads make up a
significant portion of the I/O time. In addition, containing both components in
a single process greatly improves the deployment of the software in
containerized environments, such as Kubernetes. These enhancements
substantially reduce data access latency, enabling users to process greater
amounts of data more efficiently to deliver more AI/ML benefits to the
business.
Simplified Data Management and Operability
Alluxio 2.6 enhances the mechanism to load data
into Alluxio managed storage and introduces more traceability and metrics for
easier operability. This distributed load operation is a key portion of the
AI/ML workflow, and adjustments to the internal mechanisms have been made to
optimize for the common case of loading prepared data for model training.
Improved System Visibility and Control
Alluxio 2.6 adds a large set of metrics and
traceability features enabling users to drill into the system's operating
state. These range from aggregated throughput of the system to summarized
metadata latency when serving client requests. This new level of visibility can
be used to measure the current serving capacity of the system and identify
potential resource bottlenecks. Request level tracing and timing information
can also be obtained for deep performance analysis. These new features enable
users to get new levels of visibility and control for improving SLAs of their
large-scale data pipelines for a wide variety of use cases.
Availability
Free downloads of
Alluxio 2.6 open source Community Edition and trials of Alluxio Enterprise
Edition are generally available here:
https://www.alluxio.io/download/