Virtualization Technology News and Information
Dremio Announces Support for Apache Arrow Flight High-Performance Data Transfer

Dremio announced support for Apache Arrow Flight, an open source data connectivity technology co-developed by Dremio that radically improves data transfer rates. As a result, client applications can now communicate with Dremio's data lake service more than 10 times faster than using decade-old technologies, such as Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC).

The implementation comes as data scientists, engineers and architects scale their applications and need to exchange data across process boundaries in a fast and efficient way without making copies. As companies continue to implement machine learning models and become more data-centric and data-driven, they require high-speed access to data to be successful. Apache Arrow, an open source project co-created by Dremio engineers in 2017, is now downloaded over 20 million times per month. Arrow Flight enables Arrow-powered technologies, such as Dremio and Python data science libraries, to exchange data at network speeds without any serialization/deserialization overhead.

"Even as data volumes have increased by orders of magnitude, companies have had to continue to rely upon such archaic 25-year-old technologies like ODBC and JDBC for data transfer. While these technologies are fine for applications that require small datasets, they are a bottleneck for modern applications, such as machine learning, where millions of records are retrieved over the wire. Today we are announcing the availability of Arrow Flight in Dremio, which will open the door for new applications of data and set the performance standard for high-speed data transfer in the modern enterprise," said Tomer Shiran, founder and chief product officer at Dremio.

In addition to superior performance, Arrow Flight offers many other benefits. Arrow Flight is cross-platform and has multi-language support including Python, Java and C++, with others to come. As an example, data scientists can retrieve data directly from a Flight-enabled database like Dremio into a Python dataframe without having to extract the data into local files on the client.

The ability to avoid data extracts, combined with Arrow Flight's wire-level encryption and authentication capabilities, enables companies to overcome data governance and security challenges. Since data is being consumed directly from the centralized IT-controlled database or data lake service, data teams can control and monitor access to the data and delete records when necessary to comply with GDPR and CCPA requirements, such as "the right to be forgotten."

Arrow Flight is now available as part of the Apache Arrow 3.0 release.

Published Tuesday, February 09, 2021 9:56 AM by David Marshall
Filed under:
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<February 2021>