Dremio,
the data lake engine company, announced today the release of its Data Lake
Engines for AWS, Azure, and Hybrid Cloud. This version of Dremio's open source
platform includes advanced columnar caching, predictive pipelining, and a new
execution engine kernel delivering up to 70x increases in performance.
"We process hundreds of
thousands of transactions on a daily basis and produce insights based on those
transactions; this type of capability requires sophisticated and scalable data
platforms," said Ivan Alvarez, IT vice president, big data and analytics, NCR
Corporation. "Dremio is working with NCR to solve the integration between
traditional enterprise data warehouse and scalable distributed compute
platforms for big data repositories. This integration allows NCR to also cross
pollinate data engineering knowledge among platforms and most importantly to
deliver faster data insights to our internal and external customers."
Flexibility
and Control for Data Architects, and Self-Service for Data Consumers
With Dremio, companies can
operationalize data lake storage such as ADLS and S3, making data easy to
consume while providing the interactive performance that users demand.
The engine provides ANSI SQL
capabilities, including complex joins, large aggregations, common table
expressions, sub-selects, window functions and statistical functions. With
built-in Dremio connectors for Tableau, Power BI, Looker and other analysis
tools, as well as Dremio's ODBC, JDBC, REST and Arrow Flight interfaces, it is
easy to use any client application to query the data.
Dremio executes queries directly
against data lake storage while leveraging patent-pending technology to
accelerate query execution. The data does not need to be loaded into other
systems, such as data warehouses, data marts, cubes, aggregation tables and BI
extracts. Data can reside in a variety of file formats, including Parquet, ORC,
JSON and text-delimited (e.g., CSV).
"Organizations recognize the
value of being able to quickly leverage data and analytics services to further
their data-driven initiatives," said Mike Leone, senior analyst,
Enterprise Strategy Group. "But it's more important than ever to start with a
strong data foundation, especially one that can simplify the usage of a data
lake to enable organizations to maximize data availability, accessibility, and
insights. Dremio is addressing this need by providing a self-sufficient way for
organizations and personnel to do what they want with the data that matters, no
matter where that data is, how big it is, how quickly it changes, or what structure
it's in."
Dremio
on Cloud Data Lake Storage
The latest version of Dremio
includes critical features for speeding up deployment and use of Dremio on
cloud data lake storage including:
Performance
- Columnar Cloud Cache (C3) -
Automatically caches data on NVMe or SSD close to compute to significantly
improve performance and reduce network traffic. C3 is real-time,
distributed and automatic, with zero administration or user involvement
required, and uses existing cluster resources already available.
- Column-Aware Predictive Pipelining - Eliminates
waits on high-latency storage by predicting access patterns, resulting in
3x - 5x faster query response times. Predictive Pipelining works with
columnar data (Apache Parquet and ORC) on data lake storage (S3, ADLS,
HDFS), and improves read-ahead hits and pipelining while increasing read
throughput to the maximum allowed by the network.
- Gandiva GA - Gandiva is the first
execution kernel optimized for high-performance columnar processing of
Apache Arrow data. Gandiva makes optimal use of modern CPU architectures,
is written in C++ for performance and uses runtime code-generation in LLVM
for efficient evaluation of arbitrary SQL. Performance improvements are
striking, with complex analytical workloads seeing up to 70x performance
improvement from Gandiva.
Security
- Single Sign-On and Azure AD -
Dremio now offers a flexible method to integrate Dremio with existing
identity management systems, and offers seamless user access when
switching between applications. Includes support for OAuth and Personal
Access Tokens for seamless connections over ODBC, JDBC and Arrow Flight
endpoints.
- Advanced AWS Security -
Dremio now includes native support for AWS security services for
enterprise users, such as AWS Secrets Manager, Multiple AWS IAM Roles,
Server-Side Encryption with AWS KMS-Managed Keys, and more.
"Dremio's Data Lake Engine makes
queries on data lake storage extremely fast, so that companies no longer have
to move data into proprietary data warehouses or create cubes or extracts to
get value from that data," said Tomer Shiran, co-founder and CEO, Dremio.
"We're excited to announce new technologies - like our Columnar Cloud Cache
(C3) and Predictive Pipelining - that work alongside Apache Arrow and the
Dremio-developed Gandiva kernel to deliver big increases in performance."
Dremio
Hub
With this release, Dremio is
also announcing Dremio Hub. In addition to the native connectors that come with
Dremio, Dremio Hub provides a marketplace of community-developed connectors,
making it easy to join data lake storage with many other data sources. At
launch, Dremio Hub includes contributed connectors for Snowflake, Salesforce,
and several other data sources - and the number is expected to grow quickly as
Dremio has also established a formal program for soliciting, accepting and
publishing further contributions.
Availability
The latest release of Dremio's data lake engine
is available immediately.