Virtualization Technology News and Information
Article
RSS
Spectra Logic 2024 Predictions: Archiving Large Learning Model Data on Object-Based Tape

vmblog-predictions-2024 

Industry executives and experts share their predictions for 2024.  Read them in this 16th annual VMblog.com series exclusive.

Archiving Large Learning Model Data on Object-Based Tape

By Matt Ninesling, Senior Director Tape Portfolio, Spectra Logic

Artificial intelligence (AI) is at the forefront of technological innovation today and its power is driven by the machine learning algorithms of large learning models (LLMs). LLMs consume massive datasets to mine, process, and continually refine their capabilities. As we step into 2024, a noteworthy evolution in the AI landscape is on the horizon: the emergence of LLM data archives as a compelling use case for object-based tape technology.

AI innovation lies in its ability to learn, adapt and process, and those functions are fueled by huge amounts of data. However, AI's massive training data requirements bring with them a set of challenges, particularly in the realm of potential legal ramifications stemming from AI-driven decision-making. 

As an example, in July, the Clarkson law firm filed a case against CIGNA Healthcare alleging the insurer employed AI to automate claim rejections. Clarkson also has pursued cases against Open AI and Google over privacy rights for data sourced from the web to train its AI technologies - ChatGPT and Bard respectively. Other recent legal precedents point to the need to store this data with cases arising from the use of copyrighted materials for AI model training. These suits and others like them mark some of the initial instances where AI decisions may create unintended financial consequences, underscoring the need for the preservation of input data that trains AI.

The demand for storing the massive datasets required to train LLMs is only part of the story. Also critical is retaining the output of AI processes. This is now a strategic necessity, especially as decisions made by AI algorithms become the subject of litigation, and defamation claims triggered by false or misleading information generated by AI arise. In response to this new need for comprehensive data preservation, organizations are turning to LLM data archives. These archives provide an ideal long-term repository to not only safeguard the voluminous training data but also capture the output generated by AI models. The introduction of object-based tape technology emerges as a game-changer in this landscape, offering a robust solution that capitalizes on the density, cost-effectiveness, and reliability inherent in tape storage.

Why object-based tape storage? The answer lies in the sheer scale of LLM data archives. While other storage options might struggle to accommodate these datasets, object-based tape excels in providing a viable and scalable solution. These archives often reach multi-petabyte proportions, presenting a costly storage challenge. The cost-effectiveness of tape technology ensures organizations can manage and preserve their growing volumes of AI-related data without incurring exorbitant storage costs. Combined with the searchability of an object storage interface, object-based tape provides access to data within minutes. The solution is ideal for long-term retention of data at scale, offering an efficient approach to storing LLM data archives.

The inherent strengths of tape technology are a key piece of how object-based tape can address the arising issues posed by AI-related data storage demands. Reliability is the foundation of any archival strategy, and tape has a proven track record. This makes it ideal for the long-term preservation of AI training and output data, where integrity is imperative. Tape is also the ideal choice for ensuring compliance with long-term data retention requirements, as it allows organizations to securely preserve data for extended periods. Additionally, having LLM archives stored on tape provides air-gap protection, which means that data cannot be hacked, deleted, or encrypted. This ensures a reliable means of data recovery in the event of a ransomware attack.

Object storage also brings its unique benefits to the mix. It is designed to store and retrieve large amounts of unstructured data, and its scalability and durability also make it ideal for big data workloads. Object storage de-couples the valuable information about the data from the actual data itself, typically storing this information as metadata in a separate faster tier. This means it is much easier (and faster) to search through an unfathomable amount of data within an object storage system than a file-based system. The objects are typically accessed via a unique identifier, such as a URL. As such, object storage can be accessed from anywhere, by any device that can communicate with the object storage system over the internet, making it easy to integrate it into a wide variety of applications and services.

The benefits offered by object-based tape technology position it as an important tool in the evolving landscape of AI data management. It not only addresses the immediate storage challenges posed by LLM data archives but also the emerging need for long-term data preservation. As organizations continue to utilize AI to aid in decision-making, it is easy to see the need for a reliable, scalable and cost-effective storage solution.

The convergence of AI's capabilities and the emerging challenge of legal accountability is driving the significance of LLM data archiving. Utilizing object-based tape technology to preserve that data signals a new era where organizations can navigate AI data management. As we stand at the intersection of technological innovation and legal scrutiny, LLM data archive storage on tape will become an increasingly popular solution in this evolving landscape.

##

ABOUT THE AUTHOR

Matt Ninesling 

Matt Ninesling is Spectra Logic's Senior Director of Tape Portfolio Management. He has been with Spectra Logic for over 23 years, where he has helped develop advancements for Spectra's enterprise tape library family. Matt led the development of the TFinity tape library, the world's largest storage system, and the High Performance Transporter.  He has worked in manufacturing management, new product integration, engineering project management, as well as his current engineering and product management positions.  Matt received a Bachelor of Science degree in Mechanical Engineering from the University of Colorado Boulder.
Published Wednesday, December 27, 2023 7:33 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<December 2023>
SuMoTuWeThFrSa
262728293012
3456789
10111213141516
17181920212223
24252627282930
31123456