Alluxio, the developer of
open source cloud data orchestration system, today announced it has closed 1H
2020 with sales growth of more than 650% over 1H 2019. Alluxio demonstrated
continued market strength and leadership in financial services, high tech,
telecom, internet, gaming and ecommerce across North America, Asia and Europe.
"2020 has been an unprecedented year
as organizations adjusted their priorities to emphasize cost saving and
infrastructure modernization to prepare for future growth. For data & AI
teams, a key solution is adopting a cloud / hybrid cloud strategy and Alluxio
has become a critical element for that by bringing cost savings, speed and
agility to data analytics and AI infrastructures," said Haoyuan
(H.Y.) Li, Founder and CEO, Alluxio, "Today, I am very proud to
share that we closed the first half of the year with 650% revenue growth over
the same period last year. This could not have been achieved without the strong
commitment of our customers, partners and the vibrant Alluxio community.
Marching into the second half, we will continue to make further investments to
build a stronger data orchestration system
bringing more cost savings, efficiency and agility for our customer's data
driven workloads (Spark, Presto, TensorFlow, PyTorch) in cloud and hybrid cloud
environments."
Continuing Customer Success and New Customer
Wins
Alluxio continues to attract new customers
and expand existing customer deployments around the globe.
Recent notable additions and success stories include: Alibaba,
Aunalytics, Datasapiens, EA, Nielsen, Playtika, Roblox, Ryte, Tencent, VIPShop,
Walmart, Walkme and WeRide.
Derek Tan, Executive Director of Infra & Simulation
at WeRide, said, "WeRide uses Alluxio as a hybrid cloud data gateway for
applications on-premises to access public cloud storage like AWS S3. The new
data access architecture provides a localized cache per location to eliminate
redundant requests to S3. As a result, we reduced the complexity of data
synchronization by having a single interface to access data and removed the
need to maintain a custom locally copy; reduced S3 data-out cost of downloading
redundant data; gained fast access to data to boost engineering productivity;
and now have an in-office cache of the cloud data."
Honghan Tian, Sr.
Infrastructure Architect, Data Service Center (DSC) at Tencent PCG (Platform
and Content Business Group) leverages Alluxio to optimize the analytics
performance and minimize the operating costs in building Tencent Beacon
Growing, a real-time data analytics platform. He explained, "In our
project "Beacon Growing," we have deployed Alluxio to improve Impala
performance by 2.44x for IO intensive queries and 1.20x for all queries. The
query failure rate due to timeout is also reduced by 29%. In the future, we
foresee it can reduce disk utilization by over 20% for our planned elastic
computing on Impala."
New Advancements for the Alluxio Data
Orchestration Platform
The latest release of
Alluxio, version 2.3, shipped in June. It focuses on streamlining the user
experience in hybrid cloud deployments where Alluxio is deployed with compute
in the cloud to access data on-prem. Specific new features include:
- One Command Deployment on Google Dataproc and AWS EMR - Deploying Alluxio for the first time should be easy, and
being able to repeatedly create custom deployments with Alluxio in the stack is
key for deployments in the cloud.
- Native
Kubernetes Helm Chart Support - Alluxio 2.3 supports data locality on
Kubernetes with ephemeral compute (ie. Spark) without the requirement for host
networking.
- Environment Validation
Tools - After deployment, the
hurdle of connecting on-cloud Alluxio to remote data is the biggest challenge
for new Alluxio users. With this release, a guided experience is now available
to help users during this first step after deployment.
- Concurrent Metadata Synchronization - For long running and production hybrid cloud deployments,
users found it critical for the files and directories virtualized in Alluxio to
be synchronized with the on-premise data in near real time. In Alluxio 2.3, the
new concurrent metadata synchronization algorithm provides an order of
magnitude or more performance improvement.
- Alluxio Structured Data Services - Alluxio Structured Data Services (SDS) is the subsystem
in Alluxio that enables integration with OLAP frameworks like Presto and
SparkSQL at the structured data level, as opposed to raw files and directories.
Alluxio 2.3 further improves the range of compatibility for SDS, especially in
cloud environments.
- Glue UDB Support - The Alluxio Catalog Service now supports connecting to AWS Glue for the
metadata service. This enables Alluxio Structured Data Services for table
metadata stored in AWS Glue, in addition to the existing support for the
Hive Metastore.
- ORC File Support - ORC is now a supported input type (in addition to CSV and Parquet) for
transformations with the Alluxio Catalog Service.
Open Source Community Contributions
The Facebook Presto team
has been collaborating with Alluxio on an open source data caching solution
for Presto. This is required for multiple Facebook use-cases to improve query
latency for queries that scan data from remote sources such as HDFS. In early
experiments, significant improvements in query latencies and IO scans have been
observed.