Industry executives and experts share their predictions for 2024. Read them in this 16th annual VMblog.com series exclusive.
Unstructured Data Sets Prove to be Missing Link to Successful AI Data Pipelines
By Molly Presley, CMO, Hammerspace
Data storage technologies that create data silos will see
a massive loss of momentum in 2024. One
of the biggest challenges facing organizations is putting distributed
unstructured data sets to work in their AI strategies while simultaneously
delivering the performance and scale not found in traditional enterprise
solutions. It is critical that a data pipeline is designed to use all
available compute power and can make data available to cloud models such as
those found in Databricks and Snowflake. In 2024, high-performance local
read/write access to data that is orchestrated globally in real time, in a
global data environment will become indispensable and ubiquitous. Here's what to expect in
2024 to support these trends.
Data
Orchestration Takes Center Stage
Organizations will start moving away from "store and copy" to a world of data
orchestration. Driven by AI advancements, robust tools now exist to analyze
data and tease out actionable insights. However, file storage infrastructure
has not kept pace with these advancements. Unlike solutions that try to manage
storage silos and distributed environments by moving file copies from one place
to another, data orchestration helps organizations integrate data into a single
namespace from different silos and locations and automates the placement of
data when and where it's most valuable, making it easier to analyze and derive
insights. IT organizations need the flexibility to use all of their data -
structured, semi-structured and unstructured - for iteration and may need to
move different data sets to different models. The data orchestration model
allows organizations to realize the benefits of eliminating copying data to new
files and repositories - including reducing the time to inference from weeks to
hours for large data environments.
Data
Teams Embrace the Value of Metadata to Automate Data Management
In 2024, data teams will increasingly use rich, actionable metadata to derive
value from data. With the continued growth and business value of unstructured
data across all industries, IT organizations must cope with increasing
operational complexity when they manage digital assets that span multiple
storage types, locations, and clouds. Wrangling data services across silos in a
hybrid environment can be an extremely manual and risk-prone process, made more
difficult by incompatibilities between different storage types. Metadata has
the power to enable customers to solve these problems. Machine-generated
metadata and data orchestration are crucial to data insights.
The "Law" of Data Gravity will Be Overcome,
Superpowering Hybrid Cloud Workflows
Keeping data in motion through data orchestration will allow organizations to
reap more value from their data than ever before. Although many platforms can
handle orchestrating structured data to data analytics applications and data
scientists, and a few others are reasonably proficient at orchestrating
semi-structured data, until now unstructured data orchestration has been
considered too difficult due to the previously held notions of data gravity.
Now that the "law" of data gravity has been overcome through data orchestration
systems, an entirely new universe of data-driven insights is available, and the
technology industry will benefit from a new set of highly beneficial laws
around data.
We
Finally Overcome the Data Silo Problem
In 2024, organizations will increasingly adopt parallel global file systems to
truly realize digital transformation. File systems are traditionally buried
into a proprietary storage layer, which typically locks them and an
organization's data into a storage vendor platform. Moving the data from one
vendor's storage type to another, or to a different location or cloud, involves
creating a new copy of both the file system metadata and the actual file
essence. This proliferation of file copies and the complexity needed to
initiate copy management across silos interrupts user access and is a key
problem that inhibits IT modernization and consolidation. The traditional
paradigm of the file system trapped in vendor storage platforms is inconvenient
within silos of a single data center. But the increasing migration to the cloud
has dramatically compounded the problem since it is typically difficult for
enterprises with large volumes of unstructured data to move all of their files
entirely to the cloud. Unlike solutions that try to manage storage silos and
distributed environments by shuffling file copies from one place to another, a
high-performance parallel global file system that can span all storage types,
from any vendor, and across one or more locations and clouds is more
effective.
Data
Scientists Become More Efficient as We Move Beyond ETL
Organizations
are moving to a unified global data environment, which includes all desktop,
data center, and cloud data. Data scientists in these companies will no longer
require complex ETL processes to ensure data quality, consistency, and
compatibility across different systems and formats. The elimination of ETL
would require robust systems to ensure these aspects are still adequately
addressed. Without the need for ETL, data scientists will be able to allocate
more time and resources to analysis and modeling, rather than data preparation,
and data pipelines will be less error-prone with the elimination of many
intricate steps needed to extract, clean, and load data.
##
ABOUT THE AUTHOR
Molly Presley is CMO of Hammerspace. She
brings more than years of product and growth marketing leadership experience to
the Hammerspace team. Molly has led the marketing organization and strategy at
fast-growth innovators such as Pantheon Platform, Qumulo, Quantum Corporation,
DataDirect Networks (DDN), and Spectra Logic.