Virtualization Technology News and Information
VMblog's Expert Interviews: Alluxio Discusses Big Data Frameworks and a New Memory-centric Virtual Distributed Storage System

UC Berkeley's AMPLab has become the center of gravity for innovation in big data analytics.  There is even a stack named for it, Berkeley Data Analytics Stack (BDAS).  Everyone loves the acronym.

The latest big thing from AMPLab BDAS is Alluxio, a memory-centric virtual distributed storage system.  I recently spoke to Haoyuan Li, the creator of Alluxio, founding committer of Spark and CEO of Alluxio Inc., about the latest developments in his project.

VMblog:  First of all, can you explain what problem you are trying to solve?

Haoyuan Li:  As datasets continue to grow, storage has increasingly become the critical bottleneck for enterprises leveraging Big Data frameworks like Apache Spark, Apache MapReduce, Apache Flink. We saw this when I was working on systems like Spark at UC Berkeley's AMPLab where I was a PhD candidate co-advised by two leading computer scientists in the world, Prof. Ion Stoica and Prof. Scott Shenker. 

The frameworks themselves are driving much of the exciting innovation in Big Data, but the complexity of the underlying storage systems was slowing the pace that data assets can be leveraged by these frameworks. Traditional storage architectures are inadequate for distributed computing and the size of today's datasets.

VMblog:  How did you achieve a breakthrough and create Alluxio?

Haoyuan Li:  The hard work started almost four years ago. The ah-hah moment was realizing how to give Big Data frameworks and applications access to all the data in memory for high-speed computations while also enabling true failover without replication. That was the first big step. The next came with our innovations around a unification platform abstraction that allowed users to access any persistent data stored anywhere in any major storage system or file system through an API. The version 1.0 release we announced yesterday is all about this capability. (You can see the news release here

VMblog:  How does Alluxio help developers and at the same time operators?

Haoyuan Li:  It makes life much better for both! We removed the natural tension between the two groups. Developers no longer have to worry about the underlying storage systems or wait for ops. They could work with Big Data frameworks and write distributed apps and just point to our API. Alluxio takes care of all the hard work automatically. For operators, it means they keep the systems they trust for storing their data. And they are not holding developers back with any obstacles in the storage infrastructure.

VMblog:  Do you have any examples of Alluxio in use?

Haoyuan Li:  Yes, we do. Some of the information on our technology refers to our previous name, Tachyon, just to warn you. The community changed the project name to Alluxio to avoid trademark issues and to protect the project. Anyways, your readers can check out how a big bank uses Alluxio, with some code samples even. Here is how Barclays saw a big improvement in performance and how they also took advantage of our data locality advantages to comply with banking laws around confidential customer information. We just posted on our company web site a great case study on Baidu, the giant Asia search engine. They run 2 Petabytes analytics jobs in memory using Alluxio.


Thank you to Haoyuan Li, CEO of Alluxio, for taking time out to speak with VMblog and answer a few questions.

Published Wednesday, February 24, 2016 8:47 AM by David Marshall
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<February 2016>