Industry executives and experts share their predictions for 2021. Read them in this 13th annual VMblog.com series exclusive.
2021 and the Continued Evolution of Data Lakes
By
Eran Vanounou, CEO
of Varada
So
far, 2021 seems to be misbehaving just as much as 2020. As fruitless as it
would be to predict any global news that 2021 may hold, I'd like to focus on
something we may have a better grip on: data infrastructure. At the kickoff of
2021, here's a few predictions for what data infrastructure might look like
down the line:
First and foremost, a rapid evolution is underway in terms of how
we think about the data warehouse. Data warehouses have become the most popular way to analyze data,
and are regarded as an essential component of monetizing data assets and
boosting the organization's competitive advantage. In 2021, data warehouses are
not going to disappear-they'll only continue to grow. With the widespread
transition to the cloud that has been developing for over a decade, data
warehouses have donned a fresh new look and now offer many modern attractive
capabilities including self-service and serverless.
With
the rise of the cloud, data lakes are the new kid on the block. Data lakes,
compared with data warehouses, were initially used for storing large amounts of
raw data, which would then be duplicated and modeled for analytics purposes in
data warehouses. In the last couple of years, with the rise of new query
engines, such as Presto, data lakes are also becoming popular for agile and
flexible analytics, leveraging the entire data lake to extract value.
So
now that data lakes are becoming more and more popular for analytics, their rapid
emergence from the innovation stage means that organizations are beginning to
demand more out of them. Based on our conversations with customers, we see
organizations demand simpler, easier to manage, and more cost effective means
of extracting usable business value from their data lakes, using as many data
sources as possible. Analytics use cases vary, ranging from experimental / ad
hoc analytics, business intelligence, internal dashboarding, all the way to
customer-facing applications.
As
organizations enter 2021 and re-evaluate their data strategies amid the growing
shift from data warehouse to data lake-based analytics, there are two
considerations that must lead the evaluation process:
1) Speed continues to be a critical requirement.
Data
teams are looking for platforms that are smart enough to accelerate queries
effectively and automatically based on business requirements and priorities.
They don't want to move or model data, and they don't want to add more ETLs and
storage. What they want is to enable any query to run fast, without
committing to specific data schemas.
2) Everything will come down to cost.
We've
seen a strong shift towards choosing platforms that enable users to prioritize
workloads. This is because organizations want to be able to meet budget and
performance requirements simultaneously. However, as organizations begin to add
more and more projects, they want to be able to accurately predict how their
spending will be affected. Data executives are looking for a simple pricing
model that delivers visibility and predictability so that they can continue
adding more analytics projects without incurring budget-busting surprises. This
attitude suggests that in 2021, the low cost options will be the winners.
Although
it seems like an infinitely long time from now, when 2022 rounds the corner we
can look back and see if our data lake predictions unfolded as we expected.
And, we can maybe even take a stab at guessing what'll happen beyond
then.
##
About the Author

Eran Vanounou is CEO
of Varada. He is a tech executive with 20+ years of experience in technical
leadership roles at LivePerson, NICE and Sun Microsystems.