
Virtualization and Cloud executives share their predictions for 2014. Read them in this VMblog.com series exclusive.
Contributed article by William Bain, Founder and CEO of ScaleOut Software
The Year of Convergence for In-Memory Computing, Big Data, and the Cloud
In 2013, the growing
potential for synergy between cloud, big data and in-memory computing began to
be realized. Looking to 2014, we expect this exciting technological
intersection to open up new capabilities that were previously unavailable and transform
the way developers perform data analytics.
Big data technology
continues moving to the cloud
Over the last few years,
cloud computing has increasingly helped companies to access elastic computing
resources and reduce the cost of both disk and memory-based data storage. It
also is now enabling big data technology to spread into the enterprise on a
global scale. For example, consider the growing use of Hadoop in public clouds.
Hadoop began as a popular platform for big data storage and analysis running on
dedicated, commodity servers. As emerging vendors add missing components, such
as security and real-time processing, Hadoop and similar technologies can now
be deployed as cloud-based technologies.
Cloud vendors are
rolling out technologies that enable more and more big
data applications to move to the cloud, and that trend will continue to
accelerate. Amazon Web Services offers both Elastic MapReduce and non-Hadoop
based big data technologies such as Amazon RedShift, giving developers the flexibility
to choose analytics platforms based on their use cases. Likewise, Azure has HDInsight,
and Verizon, Savvis and IBM offer Cloudera as a service in their cloud
offerings.
In-memory data grids
will be integrated with big data platforms
As the cost of RAM has
dropped dramatically and become accessible on demand from the cloud, in-memory computing
has emerged as an increasingly practical and attractive platform for big data
analysis. While still not the first choice for batch analysis of very large
(petabyte) datasets, in-memory computing offers key advantages, such as lightning
fast execution and the ability to analyze live, "operational" data. As a result,
in-memory computing technologies, such as in-memory data grids (IMDGs), have
started to integrate big data analytics such as Hadoop MapReduce, to analyze
fast changing, live data. In essence, the combination of these three
technologies - cloud, big data, and in-memory computing- lets developers take a
service oriented approach to analyzing both static and live data, all using a
common application programming interface.
To harness the power of
in-memory computing in the cloud, IMDGs incorporate several technologies that give
them the ability to host big data computations on live data. IMDGs
automatically load-balance stored data in memory across an elastic cluster of
servers with linear scalability and high availability via replication, making
them well-suited to cloud architectures. Also, some IMDGs incorporate data-parallel
computation engines which automatically distribute and execute analytics code
across the IMDG's cluster of servers. This technology can support the execution
of standard Hadoop MapReduce applications and enable IMDGs to serve as real-time
analytics platforms for big data.
IMDGs have several key
advantages over the standard, open source Hadoop distributions for processing
big data:
- Real-time analysis of live data: An IMDG can ingest, store, and update live data while
simultaneously performing rapid, in-memory execution of Hadoop MapReduce
over that data.
- Lightning fast execution for memory-sized
datasets: If a job's dataset - whether
live or static - fits in-memory, an IMDG can execute the job much faster
than Hadoop by dramatically reducing scheduling overhead, eliminating disk
access, and minimizing data motion between servers.
- Rapidly prototyping: While developing a MapReduce job, the developer can
host a subset of data in an IMDG and quickly run jobs while making changes
to the code.
These technologies will
open up many new applications.
As we move into 2014, we
expect that the synergies between the cloud, big data, and in memory computing
will continue to grow. This should open up opportunities for combining these
technologies to analyze big data in diverse, new applications, especially those
which can benefit from real-time processing. Countless use cases, such as
e-commerce, financial services, logistics, traffic monitoring, security, and sensor
analysis, can take a big step forward by obtaining analytics results for live
data in seconds instead of minutes or hours. Exciting new developments in the
evolution of big data are now upon us.
##
About the Author
ScaleOut Software was
founded in 2003 by Dr. William L. Bain. Bill has a Ph.D. (1978) in electrical
engineering/parallel computing from Rice University, and he has worked at Bell
Labs research, Intel, and Microsoft. Bill founded and ran three start-up companies
prior to joining Microsoft. In the most recent company (Valence Research), he
developed a distributed Web load-balancing software solution that was acquired
by Microsoft and is now called Network Load Balancing within the Windows Server
operating system. Dr. Bain holds several patents in computer architecture and
distributed computing. As a member of the screening committee for the
Seattle-based Alliance of Angels, Dr. Bain is actively involved in
entrepreneurship and the angel community.