Virtualization Technology News and Information
DataCore Software 2020 Predictions: Primary/Secondary Storage Will Increasingly Become "Shades of Gray" and More Predictions for the Storage Industry

VMblog Predictions 2020 

Industry executives and experts share their predictions for 2020.  Read them in this 12th annual series exclusive.

By Gerardo A. Dada, CMO, DataCore Software

Primary/Secondary Storage Will Increasingly Become "Shades of Gray"

For decades, the industry has had discrete storage systems for frequently-accessed data that requires high performance and for infrequently used data that requires lower-cost storage options- often from different vendors. For most organizations, this results in higher management overhead, difficulties migrating data on a regular basis, significant inefficiencies, and higher cost. The separation is really a reflection of a hardware-centric view of storage systems: many simply think of them as tape, AFAs, JBODs, or other terms that describe the type of media used by the system. 

Today, software-defined storage (SDS) allows IT to break its thinking away from hardware. An SDS system can abstract multiple types of storage, vendors, protocols, and network interfaces so storage is just storage. As data becomes increasingly dynamic and the relative proportion of hot, warm, and cooler data changes over time, the temperatures/tier classifications will eventually become "shades of gray," negating the need to have separate primary and secondary storage systems.

Additionally, a common belief is that you need to migrate from one storage platform to another such as file to object; many are now starting to realize that these are just artificial constructs for how to interface with the data, and an ideal system should be able to handle both without compromise.

An intelligent software-defined system makes intelligent decisions on each piece of data, placing it in the tier it belongs given its performance and cost requirements, and optimizes performance, utilization, and cost dynamically.

AI-Driven Analytics Will Drive Intelligent Data Placement

Artificial intelligence (AI) will be increasingly used to make intelligent decisions about data placement. Based on information such as telemetry data and metadata, these AI-driven solutions will be able to determine where data should be stored and how much performance to give it. This will also help offload the administrative and manual activities associated with data placement and result in better capacity planning.

The technology will be used to effectively detect data temperature gradients and automatically move it to the right bucket. For example, automated storage tiering techniques will migrate data in the background across arrays and even into the cloud-using not only access frequency in choosing whether high-performance storage or lower-cost archival storage is appropriate, but also to determine when is the right time to migrate between them based on urgency, resiliency, aging and location. This will accelerate enterprises saving both time and money.

Increased Exploitation of Metadata

Metadata-driven data management of unstructured content will prove to be a disruptive storage technology in 2020. Metadata is the tool to make IT smarter about data. Up to now, it has been limited to object storage systems for the most part, and has been tied to the data itself in most cases.

Metadata can be extremely powerful, especially when it is tightly integrated with the data from a logic perspective, yet decoupled from a location perspective, while synchronized globally-and when it is available for all types of data whether it was stored using an object or a file protocol.  These newer capabilities allow organizations to build rich attributes around the data they store.

For example, metadata can be used for data placement, compliance, to detect correlations and analyze relationships between seemingly unrelated documents and multimedia images, or to help catalog, organize, and search data-globally, across multiple storage systems and billions of files.

Coupled with AI and machine learning (ML) technologies, telemetry and content scanning software, metadata can be enriched with searchable tags and other valuable attributes based on location, time, utilization, data characteristics and others.

This is an important storage technology in part because it helps drive the automated decisions on the value of data and where it should be placed as mentioned above, while adding several dimensions to auto-tiering intelligence. Companies can also enforce data governance policies using metadata-driven data management, which is critical in these times of increasing compliance regulations and policies such as GDPR (General Data Protection Regulation) and CCPA (California Consumer Privacy Act). For example, the technology can determine if information should remain within a country or of it can be moved more broadly, or if data needs to be encrypted-and even simplifies finding data that must be deleted to comply with privacy laws.

IT should look to take advantage of the next generation of tools that enable metadata-driven data management of distributed files and objects. This is not a plug-in or a separate piece of software but a core part of the architecture of modern file and object systems that allow the optimization of storage resources in ways not previously possible.

Increased Need for Data Classification Standards

New privacy regulations such as GDPR, CCPA, and other emerging state legislation that include mandates on personal information, will become the catalyst for requiring new data classification standards.

Right now, there isn't a standard way to implement the controls required by these legislations.

Metadata management provides the foundational tools required to fulfill part of this issue as mentioned above- for example, keywords and tags associated with data can be used to define what is privileged, local, or personal information without actually having to look at the data. However, even when this information is available, there still needs to be standard classifications built that address a common way of determining if data includes relevant information when shared across different types of data formats and systems.

In the past, standard bodies such as NIST have helped define standards or schemas for global classification and data interchange, such as the Electronic Data Interchange (EDI) standards X12 and EDIFACT, or other XML schemas for industries such as healthcare and insurance. However, there is not currently a standard schema or taxonomy to uniformly categorize data, especially for privacy and compliance management.

In the absence of this, companies will continue to build their own data classification standards and processes. Until an industry standards body provides these new data classification standards, it will continue to be painful process.

The Cloud Enters a New Phase of Maturity

It's been about a decade now since the cloud became "a thing." At its inception, the cloud was increasingly touted as a less expensive storage option, a concept that was dispelled as soon as IT started using more cloud resources and saw in their bills that the cloud is not always the most cost-effective infrastructure option. For a while, many IT organizations had a top-down mandate to go "cloud first.

Today, in fact, a number of mature startups and enterprises have moved infrastructure data back on-premises where they have more control and better economics. Yet, the cloud still offers formidable simplicity, agility, and yes, cost efficiency in many cases - one of which can be long-term secure data storage. We are now smarter as an industry about what belongs in the cloud and what does not.

Often, the decision is typically to move to the cloud or keep data on-premises as it has been very difficult to build truly hybrid systems. This is especially difficult with data. Imagine a file storage system that has been in use for years and has millions of files, some of which require immediate accessibility and some of which must be archived-but it is almost impossible to determine which is which.

With modern data management tools and software-defined systems that span on-premises and multi-cloud deployments, the industry will reach a level of maturity where companies start to realize the practical use of the cloud and deploy systems that take advantage of cloud efficiencies in a way that optimizes cost, agility, access, etc. This is made possible by smart software that understands the profile of data, has the ability to access data anywhere, and therefore can move it in an automated fashion based on business rules.

With an intelligent hybrid system, data is moved to the cloud or from the cloud, as needed, to optimize cost and performance based on business needs, automatically and effectively. The software-defined system becomes the unifier across storage systems and the intelligent layer that controls data placement in an optimal way.

Hyperconverged Disillusionment and Maturity

As hyperconverged (HCI) technologies are beginning to reach a more mature state after a few years of deployment, some users have entered a period of disillusionment.  Companies are starting to see that many HCI systems have become an additional silo rather than the panacea it promised to be. HCI is, however, proving to be valuable for specific use cases like edge computing and VDI, as well as applications that can work in isolation. But, it hasn't proved to be the "end all, be all" for every IT need. 

HCI systems are certainly easier to manage and deploy. They can make life easier for IT administrators when they are the right tool for the job. To make the right decisions, it is important to understand the different architectures available in HCI and the benefits and tradeoffs of each.

For example, many HCI systems use erasure coding to expand to multiple nodes, which can be effective for large clusters, especially if performance is not the top concern. This architecture can be effective for replacing older hardware for standard applications that don't have specific performance requirements.

A different type of HCI system uses synchronous mirroring, which enable simpler, 2-node, highly available deployments, and higher performance. These are ideal for workloads such as tier-1 applications that require high-performance databases, as well as remote office/branch office (ROBO) and edge deployments.

HCI systems are here to stay, and can be a powerful tool for most IT environments. But, as with most tools, it must be used for the right job.

Transformation From a Hardware-Centric to a Software-Defined Model Will Contribute to Accelerated Adoption of Software-Defined Storage

Technology cycles seem to be accelerating, especially when it comes to storage. AFAs, NVMe, scale-out systems, HCI, hardware refresh cycles, cloud bursting, object storage, and metadata management are just some of the technologies that are driving adoption of newer storage systems.

But we know data has gravity, migrations are a pain, and few organizations have the budget and time to completely replace their storage systems with new ones - just to repeat the cycle in a few quarters.

All this means the hardware-centric mindset in the industry is not sustainable anymore. IT needs to stop thinking about storage systems as discrete, media-centric systems, from multiple vendors, that create islands of storage that become difficult to manage, inefficient in terms of capacity utilization, and nearly impossible to move data efficiently from one system to another.

It's time for the storage industry to jump on the software-defined infrastructure bandwagon and to move from a hardware-centric model to a software-centric model.  The benefits of software-defined storage are now well understood - and proven, after a period when they sounded too good to be true. The technology is now mature and readily available. SDS makes storage smarter, more efficient, and easier to manage - resulting in significant economic benefits.

The industry is also now increasingly realizing that it's the data that matters and not the actual storage system. Furthermore, the software that controls where data is placed is where the value lies, not in the type of hardware or the media.

At the same time, many companies are seeing the power of consolidating storage under a single software-defined storage platform, providing a unified storage pool and capacity efficiencies similar to the ones we saw with compute resources under virtualization. If economic indicators of a recession come true, this may even become a necessity as it will be too costly to put all data on premium storage. Furthermore, as it relates to the overall storage lifecycle, as premium storage systems reach end of life, companies will have to evaluate options that lower spending and extend the life of existing assets. Software-defined storage will be another natural fit here.


About the Author

Gerardo A. Dada, CMO at DataCore Software

Gerardo A. Dada 

Dada is an experienced technology marketer who has been at the center of the web, social, mobile and cloud revolutions at some of the world's leading companies. Prior to DataCore, he most recently served as vice president of product marketing and strategy at SolarWinds. Earlier, Dada was head of product and solutions marketing at Rackspace, where he established the company as the leader in hybrid cloud. He has also held senior marketing roles at Bazaarvoice, Motorola, and Microsoft. Dada received a five-year business degree from a UAEM University in Mexico and a general management certificate from University of Texas at Austin.

Published Wednesday, January 08, 2020 7:24 AM by David Marshall
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<January 2020>