Industry executives and experts share their predictions for 2024. Read them in this 16th annual VMblog.com series exclusive.
Real-Time Data Stream Processing Grows Up
By Eric
Sammer, CEO, Decodable
Stream
processing is in use by more and more businesses, supporting smarter and faster
decisions across a spectrum of use cases. It has won supporters for its ability
to act on time-sensitive and mission-critical events, performing real-time
analytics, and building applications with features delivered to end-users in
real time.
When
looking at open source options for stream processing, Apache Flink continues to
stand above the rest. With a diverse and active community, Flink is made more
robust through the contributions of engineers from different industries. Flink
works with a wide array of cloud providers and storage systems, and it has
connectors for the most critical and popular data infrastructure options. Flink
provides clear data processing semantics, robust state management, and reliable
job recovery. Together, these help ensure correctness-properties that take
millions of hours of production time to develop and validate.
In
2023, stream processing gained momentum as the choice for online feature extraction, data
cleansing and normalization, enrichment, and anonymization of sensitive data.
In 2024, this trend will continue to expand, integrating generative AI models to power real-time, online, user-facing
applications. As a result, stream processing will
become even more critical in the year ahead.
Additionally,
will see a rise in the adoption of integrated platforms for stream processing.
This must happen, because stream processing stacks - powered by open source
tools like Flink and Debezium - are notably simple at the small scales of
testing and limited deployment, but they notoriously become complicated when
scaled up for multi-region and multi-cluster applications. Large teams are
required to build and maintain these stacks at scale, and that model stands in
the way of broad adoption.
When
an open source stream processing tool like Flink is paired with a real-time
change data capture tool like Debezium, Flink's stream processing power becomes
a potent solution for building real-time data processing, analytics, and
event-driven applications. Reliability and scalability increase as together
these tools enable complex processing and analytics that wouldn't be possible
otherwise-at least not without a huge investment of engineering time and
effort. Engineers are empowered to react to database changes in real-time,
which is essential for applications that require up-to-date information. With
connectors for the most popular relational databases such as MySQL, PostgreSQL,
MongoDB, and more, Debezium easily integrates with Flink without the need for
complex custom development. This makes the potential for stream processing
really exciting.
Users are
growing tired of barriers to scalability and reliability like insurmountable
complexity in stream processing. They're demanding solutions so they can more
quickly ship new features and maintain a competitive edge. Integrated platforms
are the answer to this demand, and we're going to see them usher in widespread
deployment of real-time data stream processing in 2024.
##
ABOUT THE AUTHOR
Eric
Sammer is a data analytics industry veteran who has started two companies,
Rocana (acquired by Splunk in 2017), and Decodable, where he is currently CEO.
He is an author, engineer, and leader on a mission to help
companies move and transform data to achieve new and useful business results.
Eric is a speaker on topics including data engineering, ML/AI, real-time data
processing, entrepreneurship, and open source. He has spoken at events
including the Apache
Pinot conference and Confluent Current, on podcasts with Software
Engineering Daily and Sam
Ramji, and
has appeared in various industry publications. Find Eric on LinkedIn and Twitter.