Industry executives and experts share their predictions for 2024. Read them in this 16th annual VMblog.com series exclusive.
Archiving Large Learning Model Data on Object-Based Tape
By Matt Ninesling, Senior Director Tape Portfolio, Spectra Logic
Artificial
intelligence (AI) is at the forefront of technological innovation today and its
power is driven by the machine learning algorithms of large learning models
(LLMs). LLMs consume massive datasets to mine, process, and continually refine
their capabilities. As we step into 2024, a noteworthy evolution in the AI
landscape is on the horizon: the emergence of LLM data archives as a compelling
use case for object-based tape technology.
AI
innovation lies in its ability to learn, adapt and process, and those functions
are fueled by huge amounts of data. However, AI's massive training data
requirements bring with them a set of challenges, particularly in the realm of
potential legal ramifications stemming from AI-driven decision-making.
As
an example, in July, the Clarkson law firm filed a case against
CIGNA Healthcare alleging the insurer employed AI to automate claim rejections.
Clarkson also has pursued cases against Open AI and Google over privacy rights
for data sourced from the web to train its AI technologies - ChatGPT and Bard
respectively. Other recent legal precedents point to the need to store this
data with cases arising from the use of copyrighted materials for AI model
training. These suits and others like them mark some of the initial instances
where AI decisions may create unintended financial consequences, underscoring
the need for the preservation of input data that trains AI.
The
demand for storing the massive datasets required to train LLMs is only part of
the story. Also critical is retaining the output of AI processes. This is now a
strategic necessity, especially as decisions made by AI algorithms become the
subject of litigation, and defamation claims triggered by false or misleading
information generated by AI arise. In response to this new need for
comprehensive data preservation, organizations are turning to LLM data
archives. These archives provide an ideal long-term repository to not only
safeguard the voluminous training data but also capture the output generated by
AI models. The introduction of object-based tape technology emerges as a
game-changer in this landscape, offering a robust solution that capitalizes on
the density, cost-effectiveness, and reliability inherent in tape storage.
Why
object-based tape storage? The answer lies in the sheer scale of LLM data
archives. While other storage options might struggle to accommodate these
datasets, object-based tape excels in providing a viable and scalable solution.
These archives often reach multi-petabyte proportions, presenting a costly
storage challenge. The cost-effectiveness of tape technology ensures
organizations can manage and preserve their growing volumes of AI-related data
without incurring exorbitant storage costs. Combined with the searchability of
an object storage interface, object-based tape provides access to data within
minutes. The solution is ideal for long-term retention of data at scale,
offering an efficient approach to storing LLM data archives.
The
inherent strengths of tape technology are a key piece of how object-based tape
can address the arising issues posed by AI-related data storage demands.
Reliability is the foundation of any archival strategy, and tape has a proven
track record. This makes it ideal for the long-term preservation of AI training
and output data, where integrity is imperative. Tape is also the ideal choice
for ensuring compliance with long-term data retention requirements, as it
allows organizations to securely preserve data for extended periods.
Additionally, having LLM archives stored on tape provides air-gap protection,
which means that data cannot be hacked, deleted, or encrypted. This ensures a
reliable means of data recovery in the event of a ransomware attack.
Object
storage also brings its unique benefits to the mix. It is designed to store and
retrieve large amounts of unstructured data, and its scalability and durability
also make it ideal for big data workloads. Object storage de-couples the
valuable information about the data from the actual data itself, typically
storing this information as metadata in a separate faster tier. This means it
is much easier (and faster) to search through an unfathomable amount of data
within an object storage system than a file-based system. The objects are
typically accessed via a unique identifier, such as a URL. As such, object
storage can be accessed from anywhere, by any device that can communicate with
the object storage system over the internet, making it easy to integrate it
into a wide variety of applications and services.
The
benefits offered by object-based tape technology position it as an important
tool in the evolving landscape of AI data management. It not only addresses the
immediate storage challenges posed by LLM data archives but also the emerging
need for long-term data preservation. As organizations continue to utilize AI
to aid in decision-making, it is easy to see the need for a reliable, scalable
and cost-effective storage solution.
The
convergence of AI's capabilities and the emerging challenge of legal
accountability is driving the significance of LLM data archiving. Utilizing
object-based tape technology to preserve that data signals a new era where
organizations can navigate AI data management. As we stand at the intersection
of technological innovation and legal scrutiny, LLM data archive storage on
tape will become an increasingly popular solution in this evolving landscape.
##
ABOUT THE AUTHOR
Matt
Ninesling is Spectra Logic's Senior Director of Tape Portfolio Management. He
has been with Spectra Logic for over 23 years, where he has helped develop
advancements for Spectra's enterprise tape library family. Matt led the
development of the TFinity tape library, the world's largest storage system,
and the High Performance Transporter. He has worked in manufacturing
management, new product integration, engineering project management, as well as
his current engineering and product management positions. Matt received a
Bachelor of Science degree in Mechanical Engineering from the University of
Colorado Boulder.