During the 56th IT Press Tour in California this month,
Hammerspace, a leader in data orchestration and global file systems, unveiled
its latest innovation: GPU Data Orchestration for S3 Applications. This
announcement marks a significant step forward in addressing the growing demands
of AI and GPU-intensive workloads.
The Evolution of Hammerspace
Hammerspace has been on a trajectory of continuous
innovation, building on its strong business momentum from 2023. David Flynn,
Founder and CEO of Hammerspace, highlighted the company's rapid growth:
"We have done in the first half of this year, 10x all of last year. It's
already been 1,000% growth over last year."
Key advancements throughout the year include:
- January 2024: Added support for data on tape
- February 2024: Unveiled Hyperscale NAS Architecture
- March 2024: Meta published details on their use of
Hammerspace in their Gen AI architecture
- April 2024: Introduced erasure coding
Leading us up to June of 2024, where the company announced
the addition of the S3 interface to its Global Data Platform, advancing the
orchestration of existing data sets to available compute resources.
Addressing AI's Data Challenges
The AI landscape is filled with data-related challenges that
hinder progress and efficiency. Molly Presley, EVP of Global Marketing at
Hammerspace, elaborated on these challenges: "Very few companies, even
Apple, have enough of their own data accessible to train an LLM, so access to
data which already exists in getting it to the AI environment is a really big
challenge."
Hammerspace has identified several key pain points:
- Data
Issues:
Organizations face challenges with data that is siloed in disparate locations,
difficulties in assembling large data sets, and data governance challenges.
These issues hinder the effective utilization of data for AI projects.
- GPUs
Not Close to Data: A
significant barrier to AI success is the lack of proximity between GPUs
(Graphical Processing Units) and data. This means that 49% of companies expect
to run AI projects both in-cloud and on-premises by 2025, indicating the need
for efficient data movement to where the GPUs are located.
- Poor Tech Infrastructure: Existing file and object
storage systems lack the performance required to feed GPUs. This limitation in
tech infrastructure hampers the ability to efficiently process and analyze data
for AI projects.
The Hammerspace Global Data Platform
At the core of Hammerspace's offering is its Global Data
Platform. The platform provides
a unified solution for managing and accessing data across multiple locations,
including edge, core, and cloud environments. With a single namespace and a
global parallel file system, organizations can achieve data consistency and
accessibility, regardless of where the data resides. This global data platform
eliminates data silos and enables seamless data movement and collaboration.
David Flynn explained the platform's unique approach:
"This is about introducing a level of indirection - metadata separate from
data, introducing the card catalog, and putting the card catalog in charge of
how the books are actually laid out on the shelf."
The platform provides a number of key elements:
- Hyperscale
NAS Architecture:
At the core of Hammerspace is its hyperscale NAS architecture. This
architecture combines the scalability and performance of a parallel file system
with the simplicity of a NAS system. It ensures extreme parallel performance to
feed GPUs, enabling efficient data processing and analysis. Additionally, it
facilitates the movement of data to GPU resources, optimizing AI workloads.
- Data-in-Place
Assimilation:
Hammerspace introduces a unique approach called data-in-place assimilation.
With this feature, data remains in its original location while becoming visible
and accessible across the platform. This means that users can access and
process data in real-time without disruption. Data-in-place assimilation
eliminates the need for time-consuming data migrations, ensuring instant access
to files and accelerating workflows.
- Data
Orchestration:
Data orchestration is a critical aspect of Hammerspace Global Data Platform. It
automates the placement and management of data across different silos, sites,
and clouds. By simplifying data governance and security, Hammerspace
streamlines pipelines and workflows, enhancing overall data management
efficiency. With programmable metadata, users can define and customize metadata
attributes, enabling advanced data services and automation based on
metadata-driven rules.
- Advanced Data Services: Hammerspace offers a range of
advanced data services to further optimize data management. Automated data
profiling allows organizations to gain insights into their data, facilitating
better decision-making. Real-time monitoring ensures that data and storage are
aligned with policy objectives, providing visibility into data mobility.
Visualization of data mobilities helps IT professionals understand data
movement patterns and optimize data placement accordingly.
S3 Interface: Enabling New AI Pipelines and Workflows
One of the key announcements made during the IT
Press Tour was Hammerspace's introduction of S3 interface support. This new
feature enables organizations to create more efficient AI pipelines and
workflows by integrating S3-compatible data sources into the Hammerspace
ecosystem, and provides a
standardized and widely adopted interface for ingesting and accessing data.
The
S3 interface allows organizations to ingest data from S3 endpoints, edge sites,
and other sources into the Hammerspace Global Data Platform. This interface
supports various data protocols such as SMB, NFS, and NFS4.2, enabling seamless
integration with existing storage systems and cloud storage.
By
leveraging the S3 interface, organizations can easily integrate their data
sources, process data using local GPU resources, and leverage the capabilities
of the Hammerspace platform for data orchestration, automation, and advanced
data services.
The
S3 interface acts as a bridge between different data sources and the
Hammerspace platform, facilitating the efficient and scalable management of
data for AI and data-intensive workloads.
The S3 interface is available as part of
Hammerspace's standard license at no additional cost. It's currently in early
access, with general availability expected in late 2024.
Hyperscale NAS Architecture
Hammerspace's Hyperscale NAS Architecture, launched in
February, has been a game-changer. David Flynn explained its significance:
"Meta adopting Hammerspace for their LLM the Llama training allowed us to
prove the technology out." He further added, "We now have several
other large scale clusters, 1,000 node DGX clusters with 8,000 GPUs."
Key features include:
- Linear scalability from a few nodes to thousands
- Extreme performance for mixed I/O workloads
- Ability to span multiple locations (edge-core-cloud)
- Automated, non-disruptive data movement
- Standards-based enterprise NAS features
- Software-defined and storage-agnostic design
Data Orchestration and Metadata-Driven Objectives
In addition to its
comprehensive data management capabilities, Hammerspace Global Data Platform
offers specific use cases that demonstrate its effectiveness in orchestrating
data for edge computing and AI factory environments.
- Orchestrating
Data for Edge Computing:
Edge computing has gained significant traction in recent years, enabling
organizations to process data closer to the source, reducing latency and
improving real-time decision-making. Hammerspace plays a crucial role in
orchestrating data for edge computing scenarios. By leveraging its
metadata-driven AI workflow and parallel global file system, Hammerspace
enables data from multiple sources to be seamlessly integrated and processed at
the edge. This ensures that edge devices have access to the most up-to-date and
relevant data, enabling faster and more accurate AI inference and analysis.
With objective-based data placement and automated data orchestration,
Hammerspace optimizes data movement and ensures data consistency across edge
sites, enabling organizations to harness the full potential of edge computing.
- Orchestrating
Data for AI Factory:
AI factories are environments where organizations develop, train, and deploy AI
models at scale. Hammerspace Global Data Platform provides a metadata-driven AI
workflow that streamlines the data management process in AI factories. By
parallelizing the global file system and leveraging objective-based data
placement, Hammerspace enables efficient data processing and consumption for AI
workloads. It allows data from disparate sources to be easily integrated,
ensuring a single, unified data set for training and inference. With automated
data orchestration, Hammerspace simplifies the movement of data across silos,
sites, and clouds, enabling seamless collaboration and accelerating AI model
development. The platform's advanced data services, such as automated data
profiling and real-time monitoring, further enhance the efficiency and
effectiveness of AI factories, enabling organizations to achieve faster
time-to-market and improved AI model performance.
By leveraging
Hammerspace Global Data Platform for orchestrating data in edge computing and
AI factory environments, organizations can overcome the challenges of data
management, ensure data consistency, and optimize AI workflows. With its
metadata-driven approach, parallel file system performance, and automated data
orchestration capabilities, Hammerspace empowers organizations to unlock the
full potential of edge computing and AI model development, driving innovation
and business success.
During our meeting, Molly Presley highlighted a significant
use case with GigaIO and Source Code: "They have made supercomputers in a
suitcase. That's what they have brought to the table. GigaIO and SourceCode are
able to take GPUs to the edge, primarily for government and warfare reasons,
where you're capturing field data and you want to process it in the field right
now."
Conclusion: Positioning for the Future of AI and Data
Management
Hammerspace's latest announcements and innovations position
the company as a key player in the evolving landscape of AI infrastructure and
data management. By addressing critical pain points such as data silos,
performance bottlenecks, and complex data orchestration requirements,
Hammerspace is enabling organizations to build more efficient and scalable AI
pipelines and workflows. As David Flynn stated, "This is really the
opportunity, defining the framework, the language within which people will
describe and systems will automate, like the integration we've done with ShotGrid,
for example, to have an orchestration platform that can work on behalf of very
diverse use cases under different software."
The introduction of S3 interface support, coupled
with the company's existing strengths in global file systems and data
orchestration, provides a compelling solution for organizations looking to
accelerate their AI initiatives and maximize the utilization of their GPU
resources.
As the demand for AI and GPU-intensive workloads
continues to grow, Hammerspace's focus on data orchestration, performance, and
scalability puts it in a strong position to meet the needs of enterprises and
research organizations alike. With its comprehensive Global Data Platform and
innovative features like data-in-place assimilation and advanced erasure
coding, Hammerspace is well-equipped to tackle the data challenges of today and
tomorrow's AI-driven world.
As Molly Presley concluded, "Blue Origin's
quote was 'they forever changed the way we use unstructured data',"
highlighting the transformative impact of Hammerspace's technology.
##