For years, Java Platform Enterprise Edition (Java EE) has taken flack as apparent abandonware under the benign neglect of Oracle who recently surrendered the platform to Eclipse Foundation, seen by many as the latest confirmation of its declining relevance. For the nearly 10 million Java developers worldwide, they've had to look elsewhere for run-time reliability for streaming data use cases-in particular-where Java EE's primitives have failed to address the "data in motion" realities that have shifted data from something that's merely called externally by apps, to the center of the application design itself.
On the day of Lightbend's release of its new Fast Data Platform, VMblog catches up with Lightbend CTO and Akka creator Jonas Bonér, to hear his thoughts on where fast data use cases are leading JVM developers, and what he sees coming next.
VMblog: What's changing for mainstream enterprise Java developers in application infrastructure as the result of the Fast Data/Streaming movement?
Jonas Bonér: For Java developers, the Big Data movement, and to some extent also Fast Data/Streaming, has generally been something they've been watching flourish from the sidelines, at a safe distance. Of course a lot of applications have been integrated with Big Data, but the data systems themselves-usually batch-oriented, bulky beast systems like Hadoop, Data Warehouses, etc.-have been running on the sidelines. But with the move to Microservices and Reactive systems, most enterprise systems today are becoming very driven by data. The SLAs and business requirements force them to deal with, and react to, massive amounts of data, in close to real time. This has forced Java EE applications (and enterprise applications in general) to deal with "data-in-motion" head on, and make it an integral part of the system.
Another important change is that while traditional (overnight) batch processing platforms-like Hadoop-could get away with high latency and unavailability at times, modern distributed streaming platforms, performing long-running jobs (measured in days, weeks or months) processing unbounded amounts of data-like Apache Spark and Apache Flink-need to be Reactive. I.e. they need to scale elastically, reacting adaptively to usage patterns and data volumes; be resilient, always available, and never lose data; and be responsive, always deliver results in a timely fashion. And as we see more Microservices-based systems grow to be dominated by data, their architectures look more like big pipelines of streaming data.
So, looking at it from from both perspectives, you can see that Fast Data and Microservices are converging. The new Fast Data Platform that Lightbend announced today is focused on helping developers operationalize these types of Fast Data applications and bridge them to the world of Reactive microservices.
VMblog: What's your take on Java EE's support of these Fast Data directions?
Bonér: Well, it's not true to say that Java (SE and EE) hasn't focused on data at all. But I think it is true to say that it's been focused almost exclusively on "data-at-rest"-that is, accessing data through JDBC and JPA, or using things like LDAP, or File APIs-or communication of data, using things like JAX-RS and JMS. But it has been unable to understand the move to first Big Data and later streaming and Fast Data. In the Java community this has mainly been driven by Open Source efforts, which has on the other hand flourished.
One example is the Reactive Streams protocol. It started as a community effort, became very successful, and is now scheduled for inclusion into Java 9 (as the Flow API) as an integration SPI for in-process stream processing. Another example is messaging. JMS hasn't evolved much in the last 15 years, and is the reason why Apache Kafka-developed by the community-is taking off as the new messaging standard across the industry, supporting use-cases such as Event Sourcing/CQRS and streaming out of the box. It's simply because the JCP hasn't been able to keep up with the demands of modern data processing and management, in particular around streaming.
VMblog: Tell us about this Reactive Streams spec-what it was designed to solve, and where it's headed?
Bonér: Applications today are always polyglot, built in different languages, different frameworks and libraries. Very seldom do you see one single product solving everything. Modern developers have to stitch things together using glue code, and it's a lot of work. This is especially true in this new world of data, where data continuously needs to move in and out of the application and between different systems.
Reactive Streams is a specification created by engineers from Lightbend, Netflix, Twitter, Pivotal, and Red Hat as an attempt to allow composition of different tools and libraries in a standardized way, with a focus on reliable, and resilient stream communication through backpressure. Everyone needs to play ball according to the same rules, that's when everyone can have fun. Reactive Streams is a handshake protocol that forces the producer to pay attention to how the consumer is able to deal with the data being sent downstream, where both parties can negotiate the rate of how fast or slow the data should flow for both parties to be happy. Without something like this the producer can easily overflow the consumer, causing it to fail (running out of memory, or filling up the buffers so everything stalls, etc.).
The specification has been hardened by many projects, serving thousands of users, for several years, so it's very stable. We're seeing is efforts to implement it in different languages, such as JavaScript and .NET, and as a network protocol, opening up for systems written on the JVM to communicate across the wire with systems written running on other platforms. At Lightbend we have implemented the specification in the Akka Streams and Alpakka project, where it is (among other things) used to implement Enterprise Integration Patterns in a Reactive (non-blocking, scalable and resilient fashion).
##