Virtualization Technology News and Information
Quobyte 2020 Predictions: 2020 Will See Scale-Out Storage Rise in Importance to Leverage the Full Value of Data

VMblog Predictions 2020 

Industry executives and experts share their predictions for 2020.  Read them in this 12th annual series exclusive.

By Björn Kolbeck, Co-Founder and CEO, Quobyte

2020 Will See Scale-Out Storage Rise in Importance to Leverage the Full Value of Data

In years past, if you wanted to predict the future, you could go see a fortune teller, read up on your horoscope or have a psychic octopus reveal his picks of what may happen.  But in this age of information-driven knowledge, the smarter bet would be to let the data tell you what you need to know.  With the rise in machine learning, advanced applications can render models that make forecasts less a bunch of hocus pocus and more the epitome of an educated guess.  Predictions are never foolproof but as the new year approaches, one thing is certain - data is king.  Enterprises that figure out the best way to leverage that data will find more success than those who don't.  Here are a few thoughts about where the industry is moving and how your organization can benefit.

The Cloud Will Begin to Dissipate

Over the past few years, the moving of data to the cloud has gained steam.  The Amazons and Googles of the world want us to believe that cloud-enabled this and cloud-enabled that are where we are headed.  But their end goal is to lock you completely into their platform. Enterprises are realizing more and more that the cloud doesn't make sense if they have a lot of data.  Data is not mobile.  Data transfer is not cheap.  Many of the benefits that make the cloud seem "magical" are due in large part to the fact that organizations are still using 20-year-old technologies while trying to handle today's modern workloads. If you have antiquated tools for your internal IT, the cloud looks attractive.  With the cloud, you're just one of many customers and you're getting the same thing that everyone else has - something off the shelf.  Expect to see organizations adopting the tools and resources they need to be successful rather than rely on the cloud model.

Data is the New Oil

The reason to move away from the cloud and to on-premises solutions is because people are only now realizing that their volumes of data are not a sunk cost expense but rather a valuable resource that they should leverage for the benefit of their business.  With the introduction of cutting-edge applications and workloads, such as machine learning and artificial intelligence, having a lot of data becomes a critical competitive advantage for companies. Having this information on the cloud loses this advantage. In the coming year, look for organizations to again begin keeping their data on-premises, where access to it is faster, they have better control over the information, can more readily comply with government regulations and privacy laws.  They can more easily manage data by storing it in a way that best suits their customers' needs.

No More Tiers

The move to an on-premise model will be accelerated by the fact that for machine learning workloads, all data is more or less hot data. But because of this, tiering doesn't seem a viable strategy also.  The usual strategy of using cheaper HDDs for "cold" or archival data and SSDs for the "hot" data within the data center doesn't work properly for these advanced workloads. On the other hand, all-flash arrays are often too expensive, especially at scale. Considering capacity-to-throughput ratios and the large capacities that are often required in ML, HDDs remain a good choice in many cases. Rather than rely on a single, monolithic system that emphasizes low latency as its competitive advantage, a storage solution that scales out and can perform hundreds of pre-fetched processes is a better approach and negates the reliance on ultralow latencies.  Scaling performance linearly by adding storage servers is the way out and ensures that organizations can still achieve the throughput they need, even when growing from just a few GPUs to hundreds.

Scale-Out Storage Will Play a Crucial Part for the Most-Successful Data-Intense Projects

Having established that it is becoming a competitive advantage to own large quantities of data and have the ability to turn it into quality insights by quickly analyzing and learning from it, organizations need to deploy storage solutions that will allow them to quickly and cost-effectively manage the volumes of data. With tens of petabytes of data considered normal for a ML setup, and hundreds of petabytes common when working with images and videos, enterprises need to ensure that sufficient storage resources are available to stream data across multiple client machines simultaneously. GPUs need to be fed data at high rates to keep busy. With pipelining, systems pre-fetch multiple instructions up front to make sure they don't have to wait for RAM, which is often a latency bottleneck.  By fetching several files in parallel, systems do not have to wait from a processing perspective, keeping CPUs active instead of idle, thereby negating latency that may be a few microseconds slower.

Regardless of whether it's the next big thing now or in 15 years, the reality is that any computing innovation will require data storage that can handle the performance and capacity requirements of any given project. Organizations looking to take advantage of their data will need to find a system that delivers the throughput from their storage they require while taking full advantage of their GPU investment. They'll need a system that leverages HDD and flash to get the best price-performance ratio without cumbersome tiering. And through scaling out, they'll benefit from full flexibility when growing their storage by getting the throughput required today with the ability to add capacity when needed.  By deploying this common-sense approach to data management, enterprises will be sure to glean the full value of their data without the need to consult an oracle first.


About the Author

Bjorn Kolbeck 

Before taking over the helm at Quobyte, Björn spent time at Google working as tech lead for the hotel finder project (2011-2013). He was the lead developer for the open-source file system XtreemFS (2006-2011). Björn's PhD thesis dealt with fault-tolerant replication.

Published Friday, November 22, 2019 7:33 AM by David Marshall
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<November 2019>