Virtualization Technology News and Information
Article
RSS
Dremio Announces the Gandiva Initiative for Apache Arrow

Dremio, the Data-as-a-Service Platform company, announced today a new open source initiative for columnar in-memory analytics based on Apache Arrow. The Gandiva Initiative for Apache Arrow leverages the LLVM Project, an open source compiler, to significantly improve the speed and efficiency of performing in-memory analytics using Apache Arrow, making these improvements widely available to many languages and popular libraries, initially for C++ and Java, and eventually others including Python, Ruby, Go, Rust, and JavaScript.

The Gandiva Initiative for Apache Arrow provides the following benefits to make analytical data transportable and more efficient:

  • Faster time to insight in analytics, machine learning, and data sciences.
  • Lower cost of operations on cloud infrastructure for analytics, machine learning, and data sciences.

"Apache Arrow was created to provide an industry-standard, columnar, in-memory data representation," said Jacques Nadeau, co-founder and CTO of Dremio, and PMC Chair of Apache Arrow. "Dozens of open source and commercial technologies have since embraced Arrow as their standard for high-performance analytics. The Gandiva Initiative introduces a cross-platform data processing engine for Arrow, representing a quantum leap forward for processing data. Users will experience speed and efficiency gains of up to 100x in the coming months."

The Power of LLVM

LLVM is an open source project originally developed by Swift language creator Chris Lattner. LLVM's Just-in-Time compilation capabilities incorporate runtime information to produce highly optimized assembly code for the fastest possible evaluation.

By combining LLVM with Apache Arrow libraries, low-level operations on Apache Arrow in-memory buffers such as sorts, filters, and projections can be highly optimized for specific runtime environments, improving resource utilization and providing faster, lower-cost operations of analytical workloads.

Availability

The Gandiva Initiative will be made available during the 2018 DataWorks Summit in San Jose. Attendees are encouraged to attend the session "Using LLVM to Accelerate Processing of Data in Apache Arrow" on Thursday, June 21. For downloads, documentation, and ways to become involved with the Gandiva Initiative, visit www.dremio.com

Published Thursday, June 14, 2018 9:21 AM by David Marshall
Filed under:
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<June 2018>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
24252627282930
1234567