With the explosive growth of time-series data in recent
years, led by increasing adoption of IoT technologies, the processing of this
data has become a challenging task for enterprises across many industries. As
traditional general-purpose databases and data historians are seldom able to
handle the scale of modern time-series datasets, there is a trend toward
deploying a purpose-built time-series database (TSDB) as a core part of the
enterprise data infrastructure. And considering the requirements of time-series
data processing, cloud native has become an essential component for all modern
time-series databases.
Introduction
A cloud native time-series database takes full advantage of
cloud technology and distributed systems in processing time-series data. With a
cloud native time-series database, you can quickly spin up infrastructure to
prototype, develop, test, and deliver new applications and features, shortening
the time to market while reducing costs through flexible payment models such as
pay-as-you-go. By leveraging the benefits of cloud native, your systems can
handle the demands of modern computing and provide reliable, high-quality
services to customers around the world.
There are six interrelated elements that a time-series
database must have to be cloud native: a distributed design, scalability,
elasticity, resiliency, observability, and automation. This article will
discuss each of these elements as it relates to time-series data processing.
Distributed Design
In a distributed architecture, the components of a system
are spread across multiple nodes instead of being centralized in a single
location. The decoupling of compute and storage resources is particularly
important in a cloud native time-series database context because it enables
these key components to be scaled independently and more quickly than a tightly
coupled system.
Furthermore, by replicating data and services across
multiple nodes, distributed systems can continue to function even if some of
the nodes fail. This is essential for fault tolerance and disaster recovery.
And by distributing processing tasks across multiple nodes, distributed systems
can achieve better performance than centralized systems. This is because the
workload is spread out, allowing each node to focus on a smaller subset of
tasks, which can be completed more quickly. As time-series data platforms are
often ingesting and processing large amounts of data 24 hours a day, they
benefit greatly from the fault tolerance and enhanced performance provided by a
distributed design.
Finally, distributed systems do not require custom,
ultra-high-end servers or expensive and restrictive software licenses. Instead,
they can make use of commodity hardware and open-source software, and for that
reason can be built more cost-effectively than centralized systems.
Scalability
A high level of scalability is also necessary for a cloud
native time-series database as it ensures that systems and processes can
accommodate increasing demand. This is facilitated by the distributed design
mentioned above: because workloads are processed by multiple decentralized
nodes, it is easy to add more nodes to handle larger amounts of data without
overloading any single node, and likewise to remove nodes in the event that
requirements change and resources need to be reallocated.
Scalability is particularly important at this stage because
time-series datasets are rapidly increasing in scale. As a business grows, the
amount of data passing through its pipelines can only become larger, meaning
that existing data infrastructure must be expanded to meet new business
requirements. Cloud native scalability also helps to reduce costs associated
with expanding or upgrading data systems, adding resources incrementally as
needed rather than in large and expensive blocks.
Elasticity
Elasticity refers to the ability of a system to dynamically
provision and deprovision resources based on changes in demand. Automating the
process of scaling resources up or down on an as-needed basis enables data
systems to handle sudden spikes in workload and to accommodate growth over time
while maintaining optimal performance and avoiding downtime. This builds on the
scalability mentioned previously and takes it one step further into the cloud.
By providing elasticity, a cloud native time-series database
allows you to respond quickly to changing business needs, opportunities, or
challenges. You can launch new services, products, and applications quickly and
efficiently without worrying about resource constraints - additional nodes are
deployed on demand to ensure adequate performance. You can also match resource
consumption to actual demand in real time: using only the resources that are
required at a particular moment prevents overprovisioning and unnecessary
costs.
Resilience
Cloud native design understands that faults will occur and
provides resilience to recover quickly from faults and ensure business
continuity. High availability and high reliability are key components of
resilience.
For a cloud native time-series database, high availability
is achieved by replicating data across multiple nodes; if one node fails,
another can take its place and the database can continue to provide services.
The database system must ensure appropriate data consistency and have a
mechanism for establishing consensus. To implement high reliability, a
traditional write-ahead log (WAL) is still an excellent option for cloud native
systems.
With a highly available and highly reliable time-series data
platform, you can be sure that your data is accurate and that it can be used by
your applications when you need it. In addition, this kind of resilience can
help to reduce costs associated with disruptions by minimizing downtime and
reducing the need for recovery efforts.
Observability
Observability provides a comprehensive view of system
performance and behavior that lets you detect problems quickly and address them
before they cause significant downtime or service disruptions. Given the critical
nature of the time-series database in the overall data infrastructure, observability
is an indispensable characteristic used to identify and address bottlenecks,
optimize resource utilization, and improve system performance and reliability.
A cloud native time-series database must integrate with
observability systems to enable real-time visibility into system behavior. This
integration lets enterprises not only optimize system performance, but also
maintain compliance and improve customer satisfaction due to decreased
downtime.
Automation
For a time-series database to be truly cloud native, its
deployment, management, and scaling must be automated processes. Automation is
a critical component of cloud native infrastructure and application management.
In a cloud native application, automation ties into
resilience and scalability: automated failover and disaster recovery mechanisms
enhance the resilience of systems, while automated infrastructure resource
provisioning and deprovisioning enhances their scalability.
Going further, automation enables consistency across
cloud-native environments by enforcing unified policies and configurations
across all components of the system. With an automated cloud native time-series
database, you can be sure that your nodes and infrastructure are configured
correctly and consistently, reducing the risk of errors or security
vulnerabilities.
Containerization, and Kubernetes in particular, simplify the
deployment and management of applications while delivering increased agility
for DevOps teams. At the same time, containerization greatly enhances
portability, enabling deployment across various cloud platforms in addition to
on-premises and hybrid environments. These technologies are an excellent fit
for cloud-based time-series data platforms.
Conclusion
Considering the growing importance of the cloud in all data
applications, especially time-series data processing, modern time-series
databases must be cloud native to meet the business requirements of tomorrow.
The distributed design of cloud native data platforms is a powerful tool for
building the scalable, fault-tolerant, and high-performance systems that are
required for handling time-series datasets. And by leveraging cloud-native
technologies and principles in their data infrastructure, enterprises can
achieve better business outcomes, faster time-to-market, higher customer
satisfaction, and increased revenue.
##
To learn more about the transformative nature of cloud native applications and open source software, join us at KubeCon + CloudNativeCon Europe 2023, hosted by the Cloud Native Computing Foundation, which takes place from April 18-21.
ABOUT THE AUTHOR
Jeff Tao Founder and CEO, TDengine
Jeff Tao is the founder and CEO of TDengine. He has a
background as a technologist and serial entrepreneur, having previously
conducted research and development on mobile Internet at Motorola and 3Com and
established two successful tech startups. Foreseeing the explosive growth of
time-series data generated by machines and sensors now taking place, he founded
TDengine in May 2017 to develop a high-performance time-series database
purpose-built for modern IoT and IIoT businesses.