Virtualization Technology News and Information
Article
RSS
MapR Technologies 2016 Predictions: The Rise of JSON in Virtualized Big Data Infrastructures

Virtualization and Cloud executives share their predictions for 2016.  Read them in this 8th Annual VMblog.com series exclusive.

Contributed by Dale Kim, Director, Industry Solutions, MapR Technologies

The Rise of JSON in Virtualized Big Data Infrastructures

If you regularly talk to enterprise computing technologists, you'll hear again and again that "everyone is moving to the cloud." While that's not literally true (at a number of levels), we definitely see the ongoing trend of leveraging virtualized environments for a wider variety of applications.

Some of the big data use cases that will have significantly more presence on virtualized architectures in the coming year include the many types of real-time analytics on event data streams, particularly related to the Internet-of-things (IoT). Since these data sources often grow at fast and unpredictable rates, they are ideally served by virtualized environments that can elastically scale out to meet both the growing volume and the compute requirements. And since these data sources are often generated in the cloud (or at the "edge"), there is little friction in delivering the data to a cloud infrastructure for large scale analytics.

A cloud-based topology for IoT data is certainly not new, but one popular technology that will further facilitate deployments is the data format known as JavaScript Object Notation (JSON). JSON will play a bigger role in the cloud as it helps to deliver and store data in a flexible and easy-to-process format. You may already know that JSON is great for structured but non-relational data formats that are hierarchical, nested, and/or evolving, as in product catalog data. It's also great for unstructured data formats where a fixed schema may not exist, such as machine log files. More broadly, its self-describing construct makes it great for data interchange, most notably as a vehicle for web browsers to make partial updates to your page view via AJAX.

These are the characteristics make JSON an ideal format for representing an incoming stream of data points, including event data and sensor readings. These time-based events and sensor measurements are not going to be homogenous across your entire enterprise, nor will they necessarily be in a "flat" format that can be expressed as rows and columns or as comma separated values (CSV). Your entire set of "time series data" will have data points that differ across sources, and may even change in structure over time, such as from wearable devices that add new capabilities with each new version. And in many cases when storing the data, you will want to group together data based on time windows (such as all data collected within one hour intervals) to make data retrieval more efficient. You might also want to create aggregations, summaries, and samples as ways of enriching your data.

IoT devices will continue to adopt JSON, either as the raw output format, or as an output from a binary format via downstream conversion. This means the big data technologies deployed for IoT use cases will leverage JSON more heavily in 2016. Apache Hadoop will be used much more for storing JSON data, and technologies like the open source Open JSON Application Interface (OJAITM) will provide a standardized interface to JSON in Hadoop. This will be especially important for integrating many different data sources that can be correlated in a central data repository.

NoSQL databases, especially the document databases based on JSON, will play a huge role in capturing IoT data. And visionary tools like the open source Apache Drill will provide a SQL query engine on JSON data so enterprises can continue using their SQL expertise and business intelligence tools for new, non-relational data sources. JSON has already "won" as the data format of choice in the Internet, and it promises to play an important role in modern data architectures that include virtualized technologies.

##

About the Author

Dale Kim is the Director of Industry Solutions at MapR.  His background includes a variety of technical and management roles at information technology companies. While his experience includes work with relational databases, much of his career pertains to non-relational data in the areas of search, content management, and NoSQL, and includes senior roles in technical marketing, sales engineering, and support engineering. Dale holds an MBA from Santa Clara University, and a BA in Computer Science from the University of California, Berkeley.

Published Tuesday, December 01, 2015 6:27 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<December 2015>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789