Industry executives and experts share their predictions for 2023. Read them in this 15th annual VMblog.com series exclusive.
Forecast 2023: Data Lakehouses Offer Clear Strategy for Business Growth Amid Stormy Economy
By Ben Hudson,
Product Manager at Dremio
In the past
year, across different industries and global markets, enterprises have invested
heavily in data strategies that streamline business processes, drive revenue,
and enable innovation, while reducing operational costs.
The data
lakehouse, which enables companies to run data warehousing workloads directly
on the data lake, is a 2022 breakthrough that provides the architectural
foundation for companies to achieve this. Fundamental to that breakthrough are table formats such as Apache Iceberg and Delta Lake, which
make it easy for teams to transform and analyze data quickly inside the lake
without having to worry about how data is physically optimized.
Here are some
trends we can expect to see in 2023 as companies turn to data lakehouses as a
new, open architecture for analytics, and look to use their data more
efficiently to grow their business.
Expensive vendor lock-in
will be left out in the cold
Being locked
into expensive proprietary systems is less appealing than ever, amid continuing
economic uncertainty. No company can budget well or easily try new technologies
while vendors hold data hostage.
In 2023, more
companies will seek an open lakehouse architecture that allows them to reclaim
ownership of their data (where data is stored in open formats and standards in
their own account) and run analytics workloads on their data using any
processing engine.
Automation is on the rise
While data
lakehouses have gained considerable traction as a data management architecture,
lakehouse file management is still a tedious task for data engineers. For
example, how do you physically organize data for optimal data access? How do
you efficiently evolve table partitions over time to support various query
patterns?
In 2023,
we'll see more lakehouse companies automate file management processes, which
will make data engineers' lives much easier.
Semantic layers will get
a facelift
Regardless of
data architecture, companies still have major problems to tackle at the
consumption phase of the analytics workflow. One notable issue is inconsistent
cross-organizational reporting due to a lack of consensus on key business
metrics. For example, what does it mean to be a paying customer? Is everybody
calculating revenue the same way? Inconsistency arises because data consumers
define their own business logic, metrics, and calculated fields within isolated
BI tools, rather than leveraging agreed-upon definitions.
Companies aim
to solve this issue by building a semantic layer, which provides a single,
governed view of key business metrics for data analysts and data scientists to
deliver consistent reports regardless of consumption tool. While not a new
concept, they're experiencing a resurgence that will continue throughout 2023,
fueled by an increasing need to provide data consumers with fast, reliable
self-service analytics.
Customer engagement
platforms will be built directly on the lakehouse
Data
warehouses and lakehouses aggregate a wealth of data-from web analytics and
marketing engagement to purchasing patterns and customer success metrics-to
form a comprehensive, 360-degree view of customers. However, in most cases,
companies can't use this unified data to engage with customers unless it's
moved into a separate, use-case-specific platform (which has its own storage
layer). The pain of managing pipelines to move data between systems is well
known.
In 2023, more
operational tools and customer engagement platforms will announce direct
integrations with data warehouses and lakehouses, so companies can act upon
360-degree customer data without data movement. New startups that develop these
tools will be built directly on the warehouse or lakehouse, minimizing data
pipelines.
Data
lakehouses have emerged as the most efficient data management architecture to
support analytics and they provide the foundation for exciting innovation for
the analytics workflow downstream. Staying abreast of trends like these in 2023
and using them to inform data strategies will help businesses weather economic
conditions in the coming months.
##
ABOUT THE AUTHOR
Ben Hudson is a product manager at Dremio, where he leads strategic go-to-market initiatives. Prior to Dremio, Ben worked at IBM, where he led product management for their cloud data warehouse offering. He holds bachelor's and master's degrees in computer science from Wesleyan University, where he did research in programming language theory.