Andreessen
Horowitz-funded startup Alluxio boasts some of the world's largest Web
companies as customers or users of its open source memory-speed data platform
that unifies data across public and private cloud deployments - at petabyte
scale! The company has just announced its first major product upgrade (release
1.8) since their initial launch. I recently caught up with Alluxio CEO and
co-founder Haoyuan Li to learn more.
VMblog: What are you announcing and what's new?
Haoyuan Li: We're making it much easier to run machine
learning and analytic workloads in the cloud. That cloud can be a major public
cloud like AWS, Azure or Google, or your own private cloud behind the
enterprise firewall. It's much simpler now to get up and running on these very
complex jobs that require massive amounts of data.
To make it even easier, we packaged our
platform into a Docker container to run on your Kubernetes or DC/OS cluster.
For data scientists and business analysts who are not primarily developers, we
added a new
Filesystem in Userspace (FUSE) interface that taps the power of Alluxio without
involving IT operations while providing improved insight and control for both
developers and administrators. Data sources mount like a local filesystem, a
key feature simplifying self-service data access that is particularly useful
for applications like TensorFlow.
For both developers and data scientists new to Alluxio, we
added a starter kit to help them be productive faster. It includes Alluxio pre-built binaries, a quickstart guide for local machine
installation, a tutorial to mount an S3 bucket and accelerate data access with
Alluxio, a video walk-through, a whitepaper on "Accelerating On-Demand
Analytics with Alluxio," and an introduction to performance tuning for Alluxio.
VMblog: Is it fair to say, Alluxio helps its customers save on compute and storage
costs with large workloads by making it fast and easy to keep 'stale' data in
less expensive object storage while working with 'hot' data in HDFS?
Li: Yes, that's a major value prop that users
like about our platform. In our latest release, we optimized connectors for
object storage from each major cloud provider. That means third party object
storage is supported with a standard S3 interface so you don't have to rewrite
apps for new storage when moving from HDFS to the cloud.
It's all about application portability among
cloud providers for our customers. Everyone wants that. We also reduce costs
with a key new location-aware management capability. Customers can optimize for
performance with locality and minimize data movement across cloud Availability
Zones and sites.
VMblog: I
understand the research behind Alluxio started at UC Berkeley during your PhD work at
the AMPLab when it was called the Tachyon Project. Some of the high profile projects
there included Apache Spark and Apache Mesos, among others. You also started
thinking about a new computing paradigm based on memory-speed processing. I believe you call it data-first computing, can you tell us about that?
Li: We are now experiencing another computing
evolution, frequently referred to as a digital transformation but what I like
to call the data-first era. Businesses are being transformed, new services
created, and society transformed all based on the ability to extract value from
data in ways not possible in the past. As the mobile era is a new way to
interact with people and the cloud era is a new way to interact with
infrastructure, the data first era is a new way to interact with data.
In 2018 we are seeing this transformation
accelerate as more and more vendors create data-first technology solutions in
response to customers who increasingly suffer lost market opportunities from
yawning gaps in their business formed by failings in their legacy IT
infrastructure. Businesses want to make the best decisions they can right now,
not tomorrow, and they often have the data but lack the platform to extract
value from that data.
Enter the cloud. Today cloud computing touches
everything, but not as much as we think. We're still in the early days. While
we think of the cloud as ubiquitous given the growth of public cloud providers,
for the foreseeable future we will live in a hybrid cloud/on-premises world.
The fact is, data isn't truly portable today.
There needs to be a solution that overcomes
obstacles to extracting value from data and cloud adoption. Meanwhile, we are
seeing a collision between the realities of today's infrastructure and the need
for real-time data access. Every industry is becoming a software business in
order to move faster than the competition. Companies can no longer afford to
wait for apps that sluggishly access data that's scattered around a network. At
Alluxio, we're building the solution for this problem so every organization has
real-time access to all the data critical to its success. With hundreds of
production deployments from companies like Baidu, Barclays, ESRI, JD.com,
Google, Tencent, Wells Fargo and more, we are seeing just the tip of the iceberg.
##