Industry executives and experts share their predictions for 2023. Read them in this 15th annual VMblog.com series exclusive.
In 2023, We'll Begin Seeing That the Future of Data is Real-Time
By
Eric Sammer, CEO, Decodable
There
is very little tolerance for a poor user experience in data delivery today.
Consumers expect accuracy, freshness, and speed thanks to apps like DoorDash,
Waze, Uber and social media.
As
technology companies, our customers' expectations have been set by their
experiences with those apps. Legacy databases aren't equipped to handle the
technical realities of this world, and as much as IT operations teams want to
emulate the data analytics stacks of sophisticated companies delivering
lightning-fast, up-to-the-second data experiences, cobbling together the pieces
that result in real-time data delivery isn't realistic from a time, talent, or
cost perspective.
Companies
using batch ETL concepts for their data architecture are at risk of losing
customers to competitors who are offering a better user experience through a
modern data stack that delivers streaming, real-time data.
With
that backdrop, we look ahead into 2023 and see a year in which companies will
transition away from legacy, batch-based data stacks of the past and will pivot
to specialized, real-time analytical data stacks that can manipulate data
records in motion through simple stream processing. They'll see the benefit of
easy implementation of things like change data capture, multi-way joins, and
change stream processing while still having their batch and real-time needs
met.
And
taking this prediction one step further, these companies will increasingly look
to -aaS offerings to deploy and manage their new, streaming data stacks. By
investing in a managed stream processing platform-as-a-service, companies allow
data teams to focus on building streaming pipelines with the well-understood
models of SQL. The alternative imposes a steep learning curve: build a custom
platform piecing together open-source components. The barriers here include
cost prohibitive and inefficient technology acquisition and talent acquisition.
It also opens unnecessary security risks and puts a strain on existing teams.
Stream
processing has clear benefits over batch. The first is ease of amortization
over longer periods of time versus unwieldy batch processing loads. Another is
that continuous processing is more natural than discretized chunks of work.
Finally, lower latency in stream processing results in less risk of data loss,
corruption, and stale or inaccurate data.
Helpfully,
streaming stacks can do everything batch stacks can do. Thus, there's a logical
extension of our prediction: 2023 will be the year enterprise CIOs and CFOs
begin to ask, "Why are we maintaining two full stacks-a batch stack and a
real-time stack?" Indeed, doing so is a needless and hefty ops expense if one
stack can support both sets of use cases. There's no IT operations team out there
that wants to run multiple stacks.
There
has been a widespread enterprise experimentation with Kafka in an attempt to
use it to mirror operational data onto a data warehouse. But real-time data
needs aren't being met with this solution because the absence of simple stream
processing means data records still cannot be manipulated in flight. And then
there are those feeling the financial pain of the build-it-yourself platform
made of open-source components.
A
cohort of new specialized real-time analytical databases like Druid, Pinot and
Clickhouse are helping to solve the problem of internal customers attempting to
analyze data across operational databases.
If
your organization is already feeling pressure to deliver analytics faster and
faster to support real-time use cases, be prepared for that pressure to only
increase over time. If you're not feeling the pressure yet, factors like
economic uncertainty and competitive dynamics will bring that pressure to your
doorstep soon enough. Start thinking now about how your transition to real-time
data stacks can support your data teams with tools designed for the job at
hand, while paving the way to a future where real-time stacks take on more and
more of the batch workloads you're running.
##
ABOUT THE AUTHOR
Eric
Sammer is a data analytics industry veteran who has started two
companies, Rocana (acquired by Splunk in 2017), and Decodable, where he is
currently CEO. He is a distinguished author, engineer, and leader on a mission to help
companies move and transform data to achieve new and useful business results.
Eric is a widely known speaker on topics including data engineering, ML/AI,
real-time data processing, entrepreneurship, and open source. He has spoken at
events including the Apache
Pinot conference and Current summit, on podcasts with Software
Engineering Daily and Sam
Ramji, and
has appeared in various industry publications. Find Eric on LinkedIn and Twitter.