Virtualization Technology News and Information
Article
RSS
Kinetica 2018 Predictions: The Beginning of the End of the Traditional Data Warehouse

VMblog Predictions 2018

Industry executives and experts share their predictions for 2018.  Read them in this 10th annual VMblog.com series exclusive.

Contributed by Dipti Borkar, Vice President of Product Marketing at Kinetica

The Beginning of the End of the Traditional Data Warehouse

As the volume, velocity and variety (the Gartner 3V's) of data being generated continues to grow, and the requirements for managing and analyzing this Big Data become more sophisticated and demanding, the traditional data warehouse is increasingly struggling to keep pace. By traditional data warehouse, I don't mean the star schema data model that is widely in use. The star schema remains a valid approach to modeling data for analytics. By traditional enterprise data warehouse (EDW), I refer instead to the centralized server (physical or virtual) running relational database software optimized for disk storage.

These systems were built on the assumption that users would be passively consuming data, so the EDW was meant to be a read-only repository. These configurations were cost-effective and perfectly suitable for applications with relatively static data. But with relentless growth in the 3V's-particularly in the velocity of constantly-changing data-something better is needed.

The substantial reduction in the cost of random access memory led to the in-memory database becoming the next generation data warehouse. The in-memory database offers a dramatic improvement in performance by increasing read/write speeds from 10 milliseconds (for spinning disks) to 100 nanoseconds (for system memory). However, as the need to correlate larger amounts of data using increasingly complex analytical algorithms has grown, compute has become the new performance bottleneck. This is particularly the case for applications that use machine learning and/or perform in-depth analytics on streaming data.

The new compute bottleneck has given rise to yet another generation of data warehouse, this one supplementing the CPU with the graphics processing unit. GPUs deliver a performance boost of up to 100 times compared to configurations containing only CPUs. The reason is their massively parallel processing power, with some GPUs now containing upwards of 6,000 cores-nearly 200 times more than the 32 cores found in today's more powerful CPUs.

Now usually, to achieve these unprecedented levels of performance gain, you would need a major change to your application. However, SQL is still the language of choice for these new systems, making the transition significantly easier. In addition to this, GPU-accelerated in-memory databases further improve decision-making through in-database machine learning and deep learning that are capable of finding the deeper insights that elude us mere mortals.

The effort is made substantially easier with those GPU-accelerated solutions that support user-defined functions. By providing direct access to the GPU's massive parallel processing power via the NVIDIA CUDA® API, these UDFs enable users to leverage insights from popular artificial intelligence frameworks like TensorFlow and Caffe operating on the data already being managed by the database.

Getting Started

When migrating to a later generation of data warehouse that leverages either in-memory primary storage or advanced GPUs-or both-organizations might want to begin with a pilot. Candidates for the pilot should include both new applications and existing ones exhibiting poor query performance.

In addition to being able to support today's "Fast Data" applications, gaining experience with state-of-the-art data analytics platforms has another advantage: It will prepare organizations for the future where artificial intelligence and cognitive computing are destined to become integral to the next generation data warehouse.

##

About the Author

dipti borkar 

Dipti Borkar, VP of Product Marketing at Kinetica, has over 15 years of experience in relational and non-relational database technology. Prior to Kinetica, Dipti was VP of Product Marketing at Couchbase where she also held several leadership positions, including Head of Global Technical Sales and Head of Product Management. Previously Dipti was on the product team at MarkLogic, and at IBM where she managed development teams. Dipti holds a Master's degree in Computer Science from the University of California, San Diego with a specialization in databases, and an MBA from the Haas School of Business at University of California, Berkeley.

Published Monday, December 18, 2017 7:56 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<December 2017>
SuMoTuWeThFrSa
262728293012
3456789
10111213141516
17181920212223
24252627282930
31123456