What do Virtualization and Cloud executives think about 2012? Find out in this VMblog.com series exclusive.
Big data, NoSQL and Mobile Sync. Three peas. One pod.
Contributed
Article by James Phillips, Co-Founder and SVP of products, Couchbase
A few months ago I had the good fortune to hear VMware CEO Paul Maritz speak at a conference. Asked "which trends would you identify that will have the biggest impact on IT in the coming decade?" Paul identified two: cloud computing, and the transition underway in at the data layer - specifically mentioning Big Data and NoSQL.
Paul noted that, in his experience, a shift in the data model generates far-reaching ripple effects: new applications are enabled, the application development process is impacted and the infrastructure atop which these applications run changes. He saw it happen with IMS (hierarchical data model) and with the relational model. In his estimate we're on the leading edge of another fundamental shift.
Looking around this morning, it is clear Paul is not alone. It is hard to find an IT "predictions" story or blog that doesn't mention Big Data and/or NoSQL. But these terms are frequently interchanged as though they are synonyms. In part, the confusion comes from focusing too sharply on the technology itself. There are certainly similarities in implementation - notably the tendency to spread data across many servers versus storing data on a small number of very large servers.
But if one softens the focus on the technology, it becomes clear there are three distinct trends driving innovation at the data layer: data growth, web application user growth and the explosion of mobile computing.
Data growth [Big Data]. IDC estimates [i] that more than 1.8 trillion gigabytes of information will be created and replicated in 2011 - doubling every two years. Mining this raw data for knowledge is the "Big Data" problem. Hadoop and its related projects [ii] including HDFS, MapReduce, Cassandra and Hive are open source technologies that make analysis of extremely large datasets possible. These solutions are batch-processing oriented.
User growth [NoSQL]. Most new interactive software systems are accessed via browser. If available on the public Internet, these applications now have 2 billion potential users [iii] and a 24x7 uptime requirement. Regardless of dataset size, these software systems put unprecedented pressure on the data layer: massive user concurrency; need for predictable, low-latency random access to data to maintain a snappy interactive user experience; and the need for continuous operations, even during database maintenance. Couchbase and MongoDB are open source NoSQL technologies that meet the data management needs of interactive web applications.
Mobile computing growth [Mobile Sync]. Mobile devices are increasingly where we create and consume information. But data aggregation and processing will be accomplished in the cloud. IDC estimates that in 2015, 1.4 of the 4.9 zettabytes created that year will be "touched by the cloud." [iv] Delivering the right data to millions of mobile devices, when and where it is needed (and then getting it back again) is the mobile-cloud data sync problem.
These three trends, and their related technologies, are increasingly interrelated. In my opinion, they represent the emerging modern data stack - one that supports the ebb and flow of information from web and mobile applications to the cloud.
Big Data solutions are optimized for efficient ingestion and analysis of large and diverse datasets. Data can come from a vast array of sources: the data directly created by users of web and mobile applications, observations and metadata related to the use of web and mobile applications, external data feeds, intermediate analysis results. The processing of this information creates information needed by user-facing applications and is fed into a NoSQL solution.
The NoSQL solution provides low-latency, random access to the data, meeting the needs of web applications. It also allows a mobile synchronization server quick, random access to data needed by mobile users.
A Mobile Sync Server manages transient connections with mobile devices, delivering data to native mobile applications when and where it is needed; and receiving information in return.
The data layer transition is in full swing. But the data layer itself is not what has me most excited about the future. Rather, as Paul noted, it is the impact this shift will have on the types of applications we can build that is most interesting. The ability to capture, analyze and learn from data that is being generated at unprecedented scale, combined with the means to access that information, on demand, when it is temporally and spatially relevant creates application development opportunities we are only just beginning to appreciate.
###
[i] http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf
(page 1)
[ii] http://hadoop.apache.org/
[iii] http://www.internetworldstats.com/stats.htm
[iv] http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf
(page 4)
About the Author
A twenty-five year veteran of the software industry, James Phillips started his career writing software for the Apple II and TRS-80 microcomputer platforms. In 1984, at age 17, he co-founded his first software company, Fifth Generation Systems, which was acquired by Symantec in 1993 forming the foundation of Symantec's PC backup software business. Most recently, James was co-founder and CEO of Akimbi Systems, a venture-backed software company acquired by VMware in 2006. Book-ended by these entrepreneurial successes, James has held executive leadership roles in software engineering, product management, marketing and corporate development at large public companies including Intel, Synopsys and Intuit and with venture-backed software startups including Central Point Software (acquired by Symantec), Ensim and Actional Corporation (acquired by Progress Software).
Additionally, James spent two years as a technology investment banker with PaineWebber and Robertson Stephens and Co., delivering M&A advisory services to software companies. James holds a BS in Mathematics and earned his MBA, with honors, from the University of Chicago. He currently serves on the board of directors of Teneros and as an investor in and advisor to a number of privately-held software companies including Delphix, Replay Solutions and Virsto.