Industry executives and experts share their predictions for 2018. Read them in this 10th annual VMblog.com series exclusive.
Contributed by Dipti Borkar, Vice President of Product Marketing at Kinetica
The Beginning of the End of the Traditional Data Warehouse
As the volume,
velocity and variety (the Gartner 3V's) of data being generated continues to
grow, and the requirements for managing and analyzing this Big Data become more
sophisticated and demanding, the traditional data warehouse is increasingly
struggling to keep pace. By traditional data warehouse, I don't mean the star
schema data model that is widely in use. The star schema remains a valid approach
to modeling data for analytics. By traditional enterprise data warehouse (EDW),
I refer instead to the centralized server (physical or virtual) running
relational database software optimized for disk storage.
These systems were
built on the assumption that users would be passively consuming data, so the
EDW was meant to be a read-only repository. These configurations were
cost-effective and perfectly suitable for applications with relatively static
data. But with relentless growth in the 3V's-particularly in the velocity of
constantly-changing data-something better is needed.
The substantial
reduction in the cost of random access memory led to the in-memory database
becoming the next generation data warehouse. The in-memory database offers a
dramatic improvement in performance by increasing read/write speeds from 10
milliseconds (for spinning disks) to 100 nanoseconds (for system memory). However,
as the need to correlate larger amounts of data using increasingly complex
analytical algorithms has grown, compute has become the new performance
bottleneck. This is particularly the case for applications that use machine
learning and/or perform in-depth analytics on streaming data.
The new compute
bottleneck has given rise to yet another generation of data warehouse, this one
supplementing the CPU with the graphics processing unit. GPUs deliver a
performance boost of up to 100 times compared to configurations containing only
CPUs. The reason is their massively parallel processing power, with some GPUs
now containing upwards of 6,000 cores-nearly 200 times more than the 32 cores
found in today's more powerful CPUs.
Now usually, to
achieve these unprecedented levels of performance gain, you would need a major
change to your application. However, SQL is still the language of choice for
these new systems, making the transition significantly easier. In addition to
this, GPU-accelerated in-memory databases further improve decision-making
through in-database machine learning and deep learning that are capable of
finding the deeper insights that elude us mere mortals.
The effort is made substantially
easier with those GPU-accelerated solutions that support user-defined
functions. By providing direct access to the GPU's massive parallel processing
power via the NVIDIA CUDA® API, these UDFs enable users to leverage
insights from popular artificial intelligence frameworks like TensorFlow and
Caffe operating on the data already being managed by the database.
Getting Started
When migrating to a
later generation of data warehouse that leverages either in-memory primary
storage or advanced GPUs-or both-organizations might want to begin with a
pilot. Candidates for the pilot should include both new applications and
existing ones exhibiting poor query performance.
In addition to being
able to support today's "Fast Data" applications, gaining experience with
state-of-the-art data analytics platforms has another advantage: It will
prepare organizations for the future where artificial intelligence and
cognitive computing are destined to become integral to the next generation data
warehouse.
##
About the Author
Dipti Borkar, VP of Product Marketing
at Kinetica,
has over 15 years of experience in relational and non-relational database
technology. Prior to Kinetica, Dipti was VP of Product Marketing at Couchbase
where she also held several leadership positions, including Head of Global
Technical Sales and Head of Product Management. Previously Dipti was on the
product team at MarkLogic, and at IBM where she managed development teams.
Dipti holds a Master's degree in Computer Science from the University of
California, San Diego with a specialization in databases, and an MBA from the
Haas School of Business at University of California, Berkeley.