Virtualization Technology News and Information
Article
RSS
CrateDB 2019 Predictions: Time Series Databases Go Mainstream as Sensor Data Volumes Explode

Industry executives and experts share their predictions for 2019.  Read them in this 11th annual VMblog.com series exclusive.

Contributed by Christian Lutz, founder and CEO of CrateDB

Time Series Databases Go Mainstream as Sensor Data Volumes Explode

Time series databases (TSDBs) aren't new, but they do seem to oscillate between getting waves of attention and, well, not much of it. Operational historians, for example, are essentially powerful databases used to quickly write data while being deeply integrated into production processes that provide critical monitoring and dashboard info (and have done so for decades). However, operational historians fail with advanced real-time analytics and, due to their inherited and at times dated concepts, cannot play well in 2019-level architectures where databases are shipped in containers with horizontal scalability and real-time performance under high concurrent loads. 

With such requirements are becoming the default in 2019 and beyond, large, real-time TSDBs are becoming a key component of every digitalization / Industry 4.0 project. Recently, several modern concepts for time series databases have reached critical user masses and seen fast growth (e.g. InfluxDB, Prometheus, CrateDB, TimescaleDB). The big cloud providers, of course, entered the game as well and offer Time Series Insights (Azure) and, most recently, Time Stream (AWS). This is happening because it no longer makes sense to store massive volumes of "cheap" sensor data (vs "expensive" operational data) in traditional DBMS or DBMS that are not purpose-built for such time series data. Nor does it make sense from a technical PoV (query performance, scalability) or economical perspective (cloud cost).

Against that backdrop, here are my TSDB predictions for 2019:

1. Time Series Databases (and Related Database-as-a-Service Solutions) Will See a Massive Explosion in Data Volumes - and Customers - in 2019.

So far, time series database use-cases have been dominated by IT-infrastructure monitoring, data collection from IoT devices and financial and scientific (time-stamped) data. The rise of Prometheus - a great tool to collect infrastructure time series data - is just one great example in this realm. However, with the increasing need to extract value from data as a competitive advantage within almost any company in the world (from industrial manufacturing to media usage patterns), it will be a core requirement to be able to collect time series data at scale, and get value out of the data. Merely storing data points over a timeline and plotting them won't cut it anymore.

2. Time Series Database Use Cases Will Separate into Two Categories.

There will be a more granular separation happening throughout 2019, but you will more or less see use cases really start to fall into two categories:

A) IT / Systems Monitoring Time Series (the "traditional" TSDB use case). Characterized through 10-100s of metrics/sensors, just metrics, real-time write, no complex queries, rather little integration requirements, and data volumes that fall in the GBs. Use cases are often not mission critical and tend to be internal IT projects. Prometheus is a leading example, as is InfluxDB.

B) Industrial Time Series. Characterized through 100s-100,000s of sensors/metrics, deep IT-OT integration, metrics/logs/search, real-time queries under highly concurrent load (think interactive dashboards, alerting, stream processing, machine learning) and GBs to 100s TBs of data volumes. Those use-cases are often mission critical, transformative and enable real-time driven decision systems (think of a data-driven manufacturing platform). CrateDB and cloud service providers deal with such projects.

3. Cloud Service Providers / DBaaS Catch Some Heat from Specialized TSBDs in Cost and Hybrid Deployment Capabilities

There is no doubt that DBaaS is winning big, especially in modern architectures that continue to roll out (and entirely change) existing architectural landscapes. And the big boys win by a hefty margin in this category: Azure and AWS alone were responsible for two thirds of the database market growth in 2018. They suck the oxygen out of the DB market.

But, it turns out that the extremely granular pricing concepts of Azure and AWS with Time Series are difficult to predict and may impose a massive cost risk. The cost depends on the number of messages, size, number of queries, writes, storage in memory/SSD/spindle, retention times, complexity of queries and many other variables. The example of AWS Time Stream with convenient $350/month pricing assumes (among other things) just 100 queries per month on the data set. A user with industrial time series data at work runs thousands of queries per second to power a data-driven manufacturing platform. Such scenarios will quickly result in unpredictable cost. Differentiated pricing models will emerge and play to the hand of specialized time series DB vendors, and you'll see this in 2019.

The second shakeup in 2019 is lock-in. So far, the danger of a lock-in has been more a theoretical discussion and argument for most users. However, going forward this will change. Larger enterprises will increasingly be adopting a two-vendor cloud strategy (at least) and the possibility for hybrid deployments will be a design imperative. This in turn opens the market for specialized TSBD vendors that fully utilize economies of scale in the cloud (leveraging the large CSPs) but enable their users to move the database and the data at any time from one CSP to the other CSP. When properly designed with Kubernetes, this is a simple and straight forward process. Note: At CrateDB we have just been going through the exercise to deploy the same database infrastructure for one customer with three different CSPs in three continents (the U.S., Europe, and China) for a global API running across different CSPs. 

Let me add that "hybrid" goes further than just switching between CSPs if necessary. Deployment at the edge next to the cloud is also a key design imperative in many use cases. It is simply very, very convenient if your database architecture has exactly the same features when running as a 50-node cluster in the cloud as when shipped in a simple docker container to an edge device. Same features, same code.

2019 will see an explosion of time series use cases at massive scale and there is room for lot of innovation and players. I can't wait to see all of that unfold.

##

About the Author

 

Christian Lutz is the founder and CEO of CrateDB. Developed by Crate.io, CrateDB is a SQL data platform for real-time machine data and IoT applications. Lutz splits his time between San Francisco and Berlin.

Published Friday, January 25, 2019 10:01 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<January 2019>
SuMoTuWeThFrSa
303112345
6789101112
13141516171819
20212223242526
272829303112
3456789