Industry executives and experts share their predictions for 2020. Read them in this 12th annual VMblog.com series exclusive.
By Gerardo
A. Dada, CMO, DataCore Software
Primary/Secondary Storage Will Increasingly Become "Shades of Gray"
For decades, the industry has had discrete
storage systems for frequently-accessed data that requires high performance and
for infrequently used data that requires lower-cost storage options- often from
different vendors. For most organizations, this
results in higher management overhead, difficulties migrating data on a regular
basis, significant inefficiencies, and higher cost. The separation is really a
reflection of a hardware-centric view of storage systems: many simply think of
them as tape, AFAs, JBODs, or other terms that describe the type of media used
by the system.
Today, software-defined storage (SDS) allows IT to break its thinking away from
hardware. An SDS system can abstract multiple types of storage, vendors,
protocols, and network interfaces so storage is just storage. As data becomes increasingly dynamic and the
relative proportion of hot, warm, and cooler data changes over time, the temperatures/tier classifications will eventually become
"shades of gray," negating the need to have separate primary and secondary
storage systems.
Additionally, a common belief is that you need
to migrate from one storage platform to another such as file to object; many
are now starting to realize that these are just artificial constructs for how
to interface with the data, and an ideal system should be able to handle both
without compromise.
An intelligent software-defined system makes
intelligent decisions on each piece of data, placing it in the tier it belongs
given its performance and cost requirements, and optimizes performance,
utilization, and cost dynamically.
AI-Driven Analytics
Will Drive Intelligent Data Placement
Artificial
intelligence (AI) will be increasingly used to make
intelligent decisions about data placement. Based on information such as
telemetry data and metadata, these AI-driven solutions will be able to
determine where data should be stored and how much performance to give it. This
will also help offload the administrative and manual activities associated with
data placement and result in better capacity planning.
The technology will be used to effectively detect data temperature
gradients and automatically move it to the right bucket. For example, automated
storage tiering techniques will migrate data in the background across arrays
and even into the cloud-using not
only access frequency in choosing whether high-performance storage or lower-cost
archival storage is appropriate, but also to determine when is the right time
to migrate between them based on urgency, resiliency, aging and location. This
will accelerate enterprises saving both time and money.
Increased Exploitation
of Metadata
Metadata-driven data management of unstructured content will prove to
be a disruptive storage technology in 2020.
Metadata is the tool to make IT smarter about data. Up to now, it has been
limited to object storage systems for the most part, and has been tied to the
data itself in most cases.
Metadata can be extremely powerful,
especially when it is tightly integrated with the data from a logic
perspective, yet decoupled from a location perspective, while synchronized
globally-and when it is available for all types of data whether it was stored
using an object or a file protocol.
These newer capabilities allow organizations to build rich
attributes around the data they store.
For example, metadata can be used for data placement, compliance, to detect
correlations and analyze relationships between seemingly unrelated documents
and multimedia images, or to help catalog, organize, and search data-globally,
across multiple storage systems and billions of files.
Coupled with AI and machine learning (ML) technologies, telemetry and
content scanning software, metadata can be enriched with searchable tags and
other valuable attributes based on location, time, utilization, data
characteristics and others.
This is an important storage technology in part because it helps drive the automated
decisions on the value of data and where it should be placed as mentioned
above, while adding several dimensions to auto-tiering intelligence. Companies
can also enforce data governance policies using metadata-driven data
management, which is critical in these times of increasing compliance
regulations and policies such as GDPR (General Data Protection Regulation) and CCPA (California
Consumer Privacy Act). For example,
the technology can determine if information should remain within a country or of
it can be moved more broadly, or if data needs to be encrypted-and even
simplifies finding data that must be deleted to comply with privacy laws.
IT should look to take advantage of the next generation of tools that enable metadata-driven
data management of distributed files and objects. This is not a plug-in or a
separate piece of software but a core part of the architecture of modern file
and object systems that allow the optimization of storage resources in ways not previously possible.
Increased Need
for Data Classification Standards
New privacy regulations such as GDPR,
CCPA, and other emerging state legislation that include mandates on personal
information, will become the catalyst for requiring new data classification
standards.
Right now, there isn't a standard
way to implement the controls required by these legislations.
Metadata management
provides the foundational tools required to fulfill part of this issue as
mentioned above- for example, keywords and tags associated with data can be
used to define what is privileged, local, or personal information without
actually having to look at the data. However, even when this information is available,
there still needs to be standard classifications built that address a common
way of determining if data includes relevant information when shared across
different types of data formats and systems.
In the past, standard bodies such
as NIST have helped define standards or schemas for global classification and
data interchange, such as the Electronic Data Interchange (EDI) standards X12
and EDIFACT, or other XML schemas for industries such as healthcare and
insurance. However, there is not currently a standard schema or taxonomy to
uniformly categorize data, especially for privacy and compliance management.
In the absence of this, companies
will continue to build their own data classification standards and processes. Until
an industry standards body provides these new data classification standards, it
will continue to be painful process.
The Cloud
Enters a New Phase of Maturity
It's been about a decade now since the cloud became
"a thing." At its inception, the cloud was increasingly touted as a less
expensive storage option, a concept that was dispelled as soon as IT started
using more cloud resources and saw in their bills that the cloud is not always
the most cost-effective infrastructure option. For a
while, many IT organizations had a top-down mandate to go "cloud first."
Today, in fact, a number of mature startups and enterprises have moved infrastructure data back on-premises where they
have more control and better economics. Yet, the cloud still offers formidable
simplicity, agility, and yes, cost efficiency in many cases - one of which can
be long-term secure data storage. We are now smarter as an industry about what
belongs in the cloud and what does not.
Often, the decision is typically to move to
the cloud or keep data on-premises as it has been very difficult to build truly
hybrid systems. This is especially difficult with data. Imagine a file storage
system that has been in use for years and has millions of files, some of which
require immediate accessibility and some of which must be archived-but it is
almost impossible to determine which is which.
With modern data management tools and
software-defined systems that span on-premises and multi-cloud deployments, the
industry will reach a level of maturity where companies start to realize the
practical use of the cloud and deploy systems that take advantage of cloud
efficiencies in a way that optimizes cost, agility, access, etc. This is made
possible by smart software that understands the profile of data, has the
ability to access data anywhere, and therefore can move it in an automated
fashion based on business rules.
With an intelligent hybrid system, data is
moved to the cloud or from the cloud, as needed, to optimize cost and
performance based on business needs, automatically and effectively. The
software-defined system becomes the unifier across storage systems and the
intelligent layer that controls data placement in an optimal way.
Hyperconverged
Disillusionment and Maturity
As hyperconverged (HCI) technologies are
beginning to reach a more mature state after a few years of deployment, some
users have entered a period of disillusionment. Companies are starting to see that many HCI
systems have become an additional silo rather than the panacea it promised to
be. HCI is, however, proving to be valuable for specific use cases like edge
computing and VDI, as well as applications that can work in isolation. But, it hasn't
proved to be the "end all, be all" for every IT need.
HCI systems are certainly easier to manage
and deploy. They can make life easier for IT administrators when they are the
right tool for the job. To make the right decisions, it is important to
understand the different architectures available in HCI and the benefits and
tradeoffs of each.
For example, many HCI systems use erasure
coding to expand to multiple nodes, which can be effective for large clusters,
especially if performance is not the top concern. This architecture can be
effective for replacing older hardware for standard applications that don't
have specific performance requirements.
A different type of HCI system uses
synchronous mirroring, which enable simpler, 2-node, highly available
deployments, and higher performance. These are ideal for workloads such as tier-1
applications that require high-performance databases, as well as remote
office/branch office (ROBO) and edge deployments.
HCI systems are here to stay, and can be a
powerful tool for most IT environments. But, as with most tools, it must be
used for the right job.
Transformation
From a Hardware-Centric to a Software-Defined Model Will
Contribute to Accelerated Adoption of Software-Defined Storage
Technology cycles seem to be accelerating,
especially when it comes to storage. AFAs, NVMe, scale-out systems, HCI,
hardware refresh cycles, cloud bursting, object storage, and metadata
management are just some of the technologies that are driving adoption of newer
storage systems.
But we know data has gravity,
migrations are a pain, and few organizations have the budget and time to
completely replace their storage systems with new ones - just to repeat the
cycle in a few quarters.
All this means the
hardware-centric mindset in the industry is not sustainable anymore. IT needs
to stop thinking about storage systems as discrete, media-centric systems, from
multiple vendors, that create islands of storage that become difficult to
manage, inefficient in terms of capacity utilization, and nearly impossible to
move data efficiently from one system to another.
It's time for the storage
industry to jump on the software-defined infrastructure bandwagon and to move
from a hardware-centric model to a software-centric model. The benefits of software-defined storage are
now well understood - and proven, after a period when they sounded too good to
be true. The technology is now mature and readily available. SDS makes storage
smarter, more efficient, and easier to manage - resulting in significant
economic benefits.
The industry is also now increasingly realizing that it's the data that matters
and not the actual storage system. Furthermore, the software that controls
where data is placed is where the value lies, not in the type of hardware or
the media.
At the same time, many companies are
seeing the power of consolidating storage under a single software-defined
storage platform, providing a unified storage pool and capacity efficiencies
similar to the ones we saw with compute resources under virtualization. If
economic indicators of a recession come true, this may even become a necessity as
it will be too costly to put all data on premium storage. Furthermore, as it
relates to the overall storage lifecycle, as premium storage systems reach end
of life, companies will have to evaluate options that lower spending and extend
the life of existing assets. Software-defined storage will be another natural
fit here.
##
About the Author
Gerardo A. Dada, CMO at
DataCore Software
Dada is an experienced
technology marketer who has been at the center of the web, social, mobile and
cloud revolutions at some of the world's leading companies. Prior to DataCore,
he most recently served as vice president of product marketing and strategy at
SolarWinds. Earlier, Dada was head of product and solutions marketing at
Rackspace, where he established the company as the leader in hybrid cloud. He
has also held senior marketing roles at Bazaarvoice, Motorola, and Microsoft.
Dada received a five-year business degree from a UAEM University in Mexico and
a general management certificate from University of Texas at Austin.