Industry executives and experts share their predictions for 2023. Read them in this 15th annual VMblog.com series exclusive.
The Rise in Spatial and Temporal Data Analysis
By Chad Meley, CMO at Kinetica
In 2023, there will be a significant increase in the
prevalence of analytic databases designed specifically for spatial and
time-series analysis. Traditional analytic databases are often not well-suited
to dealing with spatial and time-series data, which can be complex and
difficult to analyze. As a result, we expect to see a rise in the development
and adoption of new analytic databases that are specifically designed to handle
this type of data. These systems will use advanced algorithms and specialized data
structures to efficiently store and analyze spatial and time-series data,
allowing businesses to gain valuable insights and make better-informed
decisions. Overall, the increased use of these specialized analytic databases
will enable businesses to more fully leverage their sensor data and drive
growth and innovation in the coming years.
The cost of sensors and devices capable of broadcasting their
longitude and latitude as they move through time and space is falling rapidly
with commensurate proliferation. By 2025, projections suggest 40% of all
connected IoT devices will be capable of sharing their location, up from 10% in
2020. Spatial thinking will help innovators optimize existing operations and
drive long promised digital transformations in smart cities, connected cars,
transparent supply chains, proximity marketing, new energy management
techniques, and more.
Spatial data, also known as geospatial data or geographic information,
refers to data that has a geographic component, such as the location of a
physical object or the shape of a geographical feature. Spatial data can be
represented in many different forms, including as coordinates, points, lines,
polygons, and raster images. This data can be collected using a variety of
methods, such as through global positioning systems (GPS), remote sensing, and
aerial or satellite imagery. Spatial data is often stored and managed in
specialized databases. Temporal data, also known as time-series
data, refers to data that has a time-based component. This type of data is
often used to track changes or trends over time, and can be collected at
regular intervals or at specific points in time. Examples of temporal data
include financial data, weather data, and mechanical readings such as
vibrations or temperature. This data too has typically been stored and managed
in specialized time series databases.
Spatio-temporal
databases are the combination of both spatial and temporal data that create a
more complete picture of a system or process. They are used to store and
analyze data that changes over both time and space. This type of database is
ideal for applications such as tracking the movement of objects, monitoring the
change of geographic features, and analyzing the spread of disease. They
provide a way to store and query data that is constantly changing, as well as
the ability to display it in real-time. At the start of this decade, spatio-temporal
databases began to see production deployments by innovators in
telecommunications, logistics, defense, financial services, energy,
transportation, retail, and healthcare.
While
spatial and time-series functions have been "features" in conventional analytic
databases for years, they have failed to produce breakthrough results due to
performance and scale limitations.
Spatial and temporal joins are particularly taxing on even the most
advanced distributed, columnar, memory-first, cloud databases. Unlike traditional primary and foreign key
joins (e.g., customer_id in table one joined to customer_id in table two), a
spatial join may include mapping a longitude and latitude in one table to a
polygon in table two. Just as the big
data revolution was fueled by web 2.0 data and a rethinking of the systems used
to store and analyze it, new technology in the form of vectorized databases
have emerged to satisfy the unique requirements of spatio-temporal
analytics.
Vectorized
databases are a type of distributed analytic database that uses vectorized
query execution to boost performance. In contrast, conventional distributed
analytic databases typically process data on a row-by-row basis, which can be
slower and require more computational resources. In a vectorized query engine,
data is stored in fixed-size blocks called vectors, and query operations are
performed on these vectors in parallel, rather than on individual data
elements. This allows the query engine to process multiple data elements
simultaneously, resulting in faster query execution and improved performance.
Vectorized
databases use the latest advances in NVIDIA GPUs and vectorized CPUs from Intel, and software to process
data in large blocks, allowing them to execute queries more quickly and
efficiently. This can be particularly beneficial for complex queries and
spatio-temporal joins that involve large amounts of data, as it can reduce the
amount of time and resources required to execute the query. Overall, vectorized
databases offer improved performance and scalability compared to conventional
distributed analytic databases. The leading vectorized database is Kinetica based on TPC-DS benchmarks. Last year, Intel's Jeremy
Rader, GM, Enterprise Strategy & Solutions for the Data Platforms Group proclaimed, "Kinetica's
fully-vectorized database (sic) significantly outperforms traditional cloud
databases for big data analytics."
##
ABOUT THE AUTHOR
Chad Meley is CMO at Kinetica. With over 20 years of experience as a leader
in SaaS, big data, advanced analytics, and data driven marketing, Chad is known
for innovative thinking, delivering high impact results, and building
inclusive, global, forward-thinking teams.
Prior to joining Kinetica, Chad was VP of Marketing at Teradata, and held a variety of leadership roles centered
around data and analytics while at Electronic Arts, Dell and FedEx.
Professional awards include
Best Practice Award for Driving Business Results in Data Warehousing from The
Data Warehouse Institute and Marketing Excellence Award from the Direct
Marketing Association. Chad is a regular speaker at conferences, including The
O'Reilly AI Conference, Strata, Constellation Connected Enterprise, and
Analytics Universe. He is often quoted by major media publications such as CIO
Magazine, Forbes, InformationWeek, and others.