Virtualization Technology News and Information
Decodable 2023 Predictions: We'll Begin Seeing That the Future of Data is Real-Time


Industry executives and experts share their predictions for 2023.  Read them in this 15th annual series exclusive.

In 2023, We'll Begin Seeing That the Future of Data is Real-Time

By Eric Sammer, CEO, Decodable

There is very little tolerance for a poor user experience in data delivery today. Consumers expect accuracy, freshness, and speed thanks to apps like DoorDash, Waze, Uber and social media. 

As technology companies, our customers' expectations have been set by their experiences with those apps. Legacy databases aren't equipped to handle the technical realities of this world, and as much as IT operations teams want to emulate the data analytics stacks of sophisticated companies delivering lightning-fast, up-to-the-second data experiences, cobbling together the pieces that result in real-time data delivery isn't realistic from a time, talent, or cost perspective. 

Companies using batch ETL concepts for their data architecture are at risk of losing customers to competitors who are offering a better user experience through a modern data stack that delivers streaming, real-time data.

With that backdrop, we look ahead into 2023 and see a year in which companies will transition away from legacy, batch-based data stacks of the past and will pivot to specialized, real-time analytical data stacks that can manipulate data records in motion through simple stream processing. They'll see the benefit of easy implementation of things like change data capture, multi-way joins, and change stream processing while still having their batch and real-time needs met.

And taking this prediction one step further, these companies will increasingly look to -aaS offerings to deploy and manage their new, streaming data stacks. By investing in a managed stream processing platform-as-a-service, companies allow data teams to focus on building streaming pipelines with the well-understood models of SQL. The alternative imposes a steep learning curve: build a custom platform piecing together open-source components. The barriers here include cost prohibitive and inefficient technology acquisition and talent acquisition. It also opens unnecessary security risks and puts a strain on existing teams.

Stream processing has clear benefits over batch. The first is ease of amortization over longer periods of time versus unwieldy batch processing loads. Another is that continuous processing is more natural than discretized chunks of work. Finally, lower latency in stream processing results in less risk of data loss, corruption, and stale or inaccurate data.

Helpfully, streaming stacks can do everything batch stacks can do. Thus, there's a logical extension of our prediction: 2023 will be the year enterprise CIOs and CFOs begin to ask, "Why are we maintaining two full stacks-a batch stack and a real-time stack?" Indeed, doing so is a needless and hefty ops expense if one stack can support both sets of use cases. There's no IT operations team out there that wants to run multiple stacks. 

There has been a widespread enterprise experimentation with Kafka in an attempt to use it to mirror operational data onto a data warehouse. But real-time data needs aren't being met with this solution because the absence of simple stream processing means data records still cannot be manipulated in flight. And then there are those feeling the financial pain of the build-it-yourself platform made of open-source components. 

A cohort of new specialized real-time analytical databases like Druid, Pinot and Clickhouse are helping to solve the problem of internal customers attempting to analyze data across operational databases. 

If your organization is already feeling pressure to deliver analytics faster and faster to support real-time use cases, be prepared for that pressure to only increase over time. If you're not feeling the pressure yet, factors like economic uncertainty and competitive dynamics will bring that pressure to your doorstep soon enough. Start thinking now about how your transition to real-time data stacks can support your data teams with tools designed for the job at hand, while paving the way to a future where real-time stacks take on more and more of the batch workloads you're running.




Eric Sammer is a data analytics industry veteran who has started two companies, Rocana (acquired by Splunk in 2017), and Decodable, where he is currently CEO. He is a distinguished author, engineer, and leader on a mission to help companies move and transform data to achieve new and useful business results. Eric is a widely known speaker on topics including data engineering, ML/AI, real-time data processing, entrepreneurship, and open source. He has spoken at events including the Apache Pinot conference and Current summit, on podcasts with Software Engineering Daily and Sam Ramji, and has appeared in various industry publications. Find Eric on LinkedIn and Twitter.

Published Friday, December 16, 2022 7:41 AM by David Marshall
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<December 2022>