Industry executives and experts share their predictions for 2020. Read them in this 12th annual VMblog.com series exclusive.
By Jason Nadeau, VP of Marketing, Dremio
Putting Your Data Lake to Work
Competitive advantage comes from extracting more value,
faster, from data. To achieve this, modern enterprises are moving quickly to
power their next-generation analytics initiatives with an open and efficient
cloud data lake architecture.
Cloud data warehouses
turn out to be a Big Data detour.
Given the tremendous cost and complexity
associated with traditional on-premise data warehouses, it wasn't surprising
that a new generation of cloud-native enterprise data warehouse emerged. But
savvy enterprises have figured out that cloud data warehouses are just a better
implementation of a legacy architecture, and so they're avoiding the detour and
moving directly to a next-generation architecture built around cloud data
lakes. In this new architecture data doesn't get moved or copied, there is no
data warehouse, and no associated ETL, cubes, or other workarounds. We predict
75% of the global 2000 will be in production or in pilot with a cloud data lake
in 2020, using multiple best-of breed engines for different use cases across
data science, data pipelines, BI, and interactive/ad-hoc analysis.
Enterprises move to
benchmark price/performance vs. raw performance.
Escalating public cloud
compute costs have forced enterprises to re-prioritize the evaluation criteria
for their cloud services, with higher efficiency and lower costs now front and
center. The highly elastic nature of the public cloud means that cloud services
can (but don't always) release resources when not in use. And services which
deliver the same unit of work with higher performance are in effect more
efficient and cost less. In the on-premises world of over-provisioned assets
such gains are hard to reclaim, but in the public cloud time really is money.
This has created a new battleground where cloud services, particularly for
analytics, are competing on the dimension of service efficiency to achieve the
lowest cost per compute, and 2020 will see that battle heat up.
The rise of data
microservices for bulk analytics.
Traditional operational microservices
have been designed and optimized for processing small numbers of records,
primarily due to bandwidth constraints with existing protocols and transports.
But now this long-standing bottleneck issue has been solved with the arrival of
Apache Arrow Flight, which provides a high performance, massively parallel
protocol for big data transfer across different applications and platforms in a
data lake storage environment. We predict that in 2020 Arrow Flight will
unleash a new category of data microservices focused on bulk analytical
operations with high volumes of records, and in turn these data microservices
will enable loosely coupled analytical architectures which can evolve much
faster than traditional monolithic analytical architectures.
##
About the Author
Jason Nadeau is Dremio's VP of Marketing, helping enterprises in every
industry realize the dramatic benefits made possible by the tectonic shift to
analytics directly on cloud data lake storage. Jason brings over two decades of
marketing, product management, and pre-sales experience in enterprise software
and hardware across multiple domains. Prior to joining Dremio, Jason was VP of
Product Marketing for Pure Storage where he created Pure's highly
differentiated Evergreen Storage model of consumption and helped grow the
company's revenues nearly 10X over five years. Prior to Pure Storage, Jason was
VP of Product Management at Hewlett-Packard (Software), and before that he held
escalating leadership roles in product management at Symantec and Veritas.
Jason holds a Bachelor of Science in electrical engineering from the University
of Victoria and a Master of Business Administration from UC Berkeley, Haas
School of Business.