Virtualization Technology News and Information
Article
RSS
ScaleOut Software 2014 Predictions - The Year of Convergence for In-Memory Computing, Big Data, and the Cloud

VMblog 2014 Prediction Series

Virtualization and Cloud executives share their predictions for 2014.  Read them in this VMblog.com series exclusive.

Contributed article by William Bain, Founder and CEO of ScaleOut Software

The Year of Convergence for In-Memory Computing, Big Data, and the Cloud

In 2013, the growing potential for synergy between cloud, big data and in-memory computing began to be realized. Looking to 2014, we expect this exciting technological intersection to open up new capabilities that were previously unavailable and transform the way developers perform data analytics.

Big data technology continues moving to the cloud

Over the last few years, cloud computing has increasingly helped companies to access elastic computing resources and reduce the cost of both disk and memory-based data storage. It also is now enabling big data technology to spread into the enterprise on a global scale. For example, consider the growing use of Hadoop in public clouds. Hadoop began as a popular platform for big data storage and analysis running on dedicated, commodity servers. As emerging vendors add missing components, such as security and real-time processing, Hadoop and similar technologies can now be deployed as cloud-based technologies.

Cloud vendors are rolling out technologies that enable more and more big data applications to move to the cloud, and that trend will continue to accelerate. Amazon Web Services offers both Elastic MapReduce and non-Hadoop based big data technologies such as Amazon RedShift, giving developers the flexibility to choose analytics platforms based on their use cases. Likewise, Azure has HDInsight, and Verizon, Savvis and IBM offer Cloudera as a service in their cloud offerings.

In-memory data grids will be integrated with big data platforms

As the cost of RAM has dropped dramatically and become accessible on demand from the cloud, in-memory computing has emerged as an increasingly practical and attractive platform for big data analysis. While still not the first choice for batch analysis of very large (petabyte) datasets, in-memory computing offers key advantages, such as lightning fast execution and the ability to analyze live, "operational" data. As a result, in-memory computing technologies, such as in-memory data grids (IMDGs), have started to integrate big data analytics such as Hadoop MapReduce, to analyze fast changing, live data. In essence, the combination of these three technologies - cloud, big data, and in-memory computing- lets developers take a service oriented approach to analyzing both static and live data, all using a common application programming interface.

To harness the power of in-memory computing in the cloud, IMDGs incorporate several technologies that give them the ability to host big data computations on live data. IMDGs automatically load-balance stored data in memory across an elastic cluster of servers with linear scalability and high availability via replication, making them well-suited to cloud architectures. Also, some IMDGs incorporate data-parallel computation engines which automatically distribute and execute analytics code across the IMDG's cluster of servers. This technology can support the execution of standard Hadoop MapReduce applications and enable IMDGs to serve as real-time analytics platforms for big data.

IMDGs have several key advantages over the standard, open source Hadoop distributions for processing big data:
  1. Real-time analysis of live data: An IMDG can ingest, store, and update live data while simultaneously performing rapid, in-memory execution of Hadoop MapReduce over that data.  
  2. Lightning fast execution for memory-sized datasets: If a job's dataset - whether live or static - fits in-memory, an IMDG can execute the job much faster than Hadoop by dramatically reducing scheduling overhead, eliminating disk access, and minimizing data motion between servers.
  3. Rapidly prototyping: While developing a MapReduce job, the developer can host a subset of data in an IMDG and quickly run jobs while making changes to the code.

These technologies will open up many new applications.

As we move into 2014, we expect that the synergies between the cloud, big data, and in memory computing will continue to grow. This should open up opportunities for combining these technologies to analyze big data in diverse, new applications, especially those which can benefit from real-time processing. Countless use cases, such as e-commerce, financial services, logistics, traffic monitoring, security, and sensor analysis, can take a big step forward by obtaining analytics results for live data in seconds instead of minutes or hours. Exciting new developments in the evolution of big data are now upon us.

##

About the Author

ScaleOut Software was founded in 2003 by Dr. William L. Bain. Bill has a Ph.D. (1978) in electrical engineering/parallel computing from Rice University, and he has worked at Bell Labs research, Intel, and Microsoft. Bill founded and ran three start-up companies prior to joining Microsoft. In the most recent company (Valence Research), he developed a distributed Web load-balancing software solution that was acquired by Microsoft and is now called Network Load Balancing within the Windows Server operating system. Dr. Bain holds several patents in computer architecture and distributed computing. As a member of the screening committee for the Seattle-based Alliance of Angels, Dr. Bain is actively involved in entrepreneurship and the angel community.
Published Friday, December 20, 2013 6:39 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<December 2013>
SuMoTuWeThFrSa
24252627282930
1234567
891011121314
15161718192021
22232425262728
2930311234