Dremio announced support
for Apache Arrow Flight, an open source data connectivity technology
co-developed by Dremio that radically improves data transfer rates. As a
result, client applications can now communicate with Dremio's data lake
service more than 10 times faster than using decade-old technologies,
such as Open Database Connectivity (ODBC) and Java Database Connectivity
(JDBC).
The
implementation comes as data scientists, engineers and architects scale
their applications and need to exchange data across process boundaries
in a fast and efficient way without making copies. As companies continue
to implement machine learning models and become more data-centric and
data-driven, they require high-speed access to data to be successful.
Apache Arrow, an open source project co-created by Dremio engineers in
2017, is now downloaded over 20 million times per month. Arrow Flight
enables Arrow-powered technologies, such as Dremio and Python data
science libraries, to exchange data at network speeds without any
serialization/deserialization overhead.
"Even
as data volumes have increased by orders of magnitude, companies have
had to continue to rely upon such archaic 25-year-old technologies like
ODBC and JDBC for data transfer. While these technologies are fine for
applications that require small datasets, they are a bottleneck for
modern applications, such as machine learning, where millions of records
are retrieved over the wire. Today we are announcing the availability
of Arrow Flight in Dremio, which will open the door for new applications
of data and set the performance standard for high-speed data transfer
in the modern enterprise," said Tomer Shiran, founder and chief product officer at Dremio.
In
addition to superior performance, Arrow Flight offers many other
benefits. Arrow Flight is cross-platform and has multi-language support
including Python, Java and C++, with others to come. As an example, data
scientists can retrieve data directly from a Flight-enabled database
like Dremio into a Python dataframe without having to extract the data
into local files on the client.
The
ability to avoid data extracts, combined with Arrow Flight's wire-level
encryption and authentication capabilities, enables companies to
overcome data governance and security challenges. Since data is being
consumed directly from the centralized IT-controlled database or data
lake service, data teams can control and monitor access to the data and
delete records when necessary to comply with GDPR and CCPA requirements,
such as "the right to be forgotten."
Arrow Flight is now available as part of the Apache Arrow 3.0 release.