Virtualization Technology News and Information
VMblog's Expert Interviews: Alluxio Talks Major Product Upgrade and Unifying Data at Memory Speed


Andreessen Horowitz-funded startup Alluxio boasts some of the world's largest Web companies as customers or users of its open source memory-speed data platform that unifies data across public and private cloud deployments - at petabyte scale!  The company has just announced its first major product upgrade (release 1.8) since their initial launch.  I recently caught up with Alluxio CEO and co-founder Haoyuan Li to learn more.

VMblog:  What are you announcing and what's new?

Haoyuan Li:  We're making it much easier to run machine learning and analytic workloads in the cloud. That cloud can be a major public cloud like AWS, Azure or Google, or your own private cloud behind the enterprise firewall. It's much simpler now to get up and running on these very complex jobs that require massive amounts of data. 

To make it even easier, we packaged our platform into a Docker container to run on your Kubernetes or DC/OS cluster. For data scientists and business analysts who are not primarily developers, we added a new Filesystem in Userspace (FUSE) interface that taps the power of Alluxio without involving IT operations while providing improved insight and control for both developers and administrators. Data sources mount like a local filesystem, a key feature simplifying self-service data access that is particularly useful for applications like TensorFlow.

For both developers and data scientists new to Alluxio, we added a starter kit to help them be productive faster. It includes Alluxio pre-built binaries, a quickstart guide for local machine installation, a tutorial to mount an S3 bucket and accelerate data access with Alluxio, a video walk-through, a whitepaper on "Accelerating On-Demand Analytics with Alluxio," and an introduction to performance tuning for Alluxio.

VMblog:  Is it fair to say, Alluxio helps its customers save on compute and storage costs with large workloads by making it fast and easy to keep 'stale' data in less expensive object storage while working with 'hot' data in HDFS?

Li:  Yes, that's a major value prop that users like about our platform. In our latest release, we optimized connectors for object storage from each major cloud provider. That means third party object storage is supported with a standard S3 interface so you don't have to rewrite apps for new storage when moving from HDFS to the cloud. 

It's all about application portability among cloud providers for our customers. Everyone wants that. We also reduce costs with a key new location-aware management capability. Customers can optimize for performance with locality and minimize data movement across cloud Availability Zones and sites.

VMblog:  I understand the research behind Alluxio started at UC Berkeley during your PhD work at the AMPLab when it was called the Tachyon Project.  Some of the high profile projects there included Apache Spark and Apache Mesos, among others.  You also started thinking about a new computing paradigm based on memory-speed processing.  I believe you call it data-first computing, can you tell us about that?

Li:  We are now experiencing another computing evolution, frequently referred to as a digital transformation but what I like to call the data-first era. Businesses are being transformed, new services created, and society transformed all based on the ability to extract value from data in ways not possible in the past. As the mobile era is a new way to interact with people and the cloud era is a new way to interact with infrastructure, the data first era is a new way to interact with data.

In 2018 we are seeing this transformation accelerate as more and more vendors create data-first technology solutions in response to customers who increasingly suffer lost market opportunities from yawning gaps in their business formed by failings in their legacy IT infrastructure. Businesses want to make the best decisions they can right now, not tomorrow, and they often have the data but lack the platform to extract value from that data.

Enter the cloud. Today cloud computing touches everything, but not as much as we think. We're still in the early days. While we think of the cloud as ubiquitous given the growth of public cloud providers, for the foreseeable future we will live in a hybrid cloud/on-premises world. The fact is, data isn't truly portable today.

There needs to be a solution that overcomes obstacles to extracting value from data and cloud adoption. Meanwhile, we are seeing a collision between the realities of today's infrastructure and the need for real-time data access. Every industry is becoming a software business in order to move faster than the competition. Companies can no longer afford to wait for apps that sluggishly access data that's scattered around a network. At Alluxio, we're building the solution for this problem so every organization has real-time access to all the data critical to its success. With hundreds of production deployments from companies like Baidu, Barclays, ESRI,, Google, Tencent, Wells Fargo and more, we are seeing just the tip of the iceberg.


Published Tuesday, July 31, 2018 7:34 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<July 2018>