Virtualization Technology News and Information
Hammerspace Unveils GPU Data Orchestration for S3 Applications: Accelerating AI and GPU Workloads

purple blue DC 

During the 56th IT Press Tour in California this month, Hammerspace, a leader in data orchestration and global file systems, unveiled its latest innovation: GPU Data Orchestration for S3 Applications. This announcement marks a significant step forward in addressing the growing demands of AI and GPU-intensive workloads.

The Evolution of Hammerspace

Hammerspace has been on a trajectory of continuous innovation, building on its strong business momentum from 2023. David Flynn, Founder and CEO of Hammerspace, highlighted the company's rapid growth: "We have done in the first half of this year, 10x all of last year. It's already been 1,000% growth over last year."

Key advancements throughout the year include:

  • January 2024: Added support for data on tape
  • February 2024: Unveiled Hyperscale NAS Architecture
  • March 2024: Meta published details on their use of Hammerspace in their Gen AI architecture
  • April 2024: Introduced erasure coding

Leading us up to June of 2024, where the company announced the addition of the S3 interface to its Global Data Platform, advancing the orchestration of existing data sets to available compute resources.

Addressing AI's Data Challenges

The AI landscape is filled with data-related challenges that hinder progress and efficiency. Molly Presley, EVP of Global Marketing at Hammerspace, elaborated on these challenges: "Very few companies, even Apple, have enough of their own data accessible to train an LLM, so access to data which already exists in getting it to the AI environment is a really big challenge."

Hammerspace has identified several key pain points:

  1. Data Issues: Organizations face challenges with data that is siloed in disparate locations, difficulties in assembling large data sets, and data governance challenges. These issues hinder the effective utilization of data for AI projects.
  2. GPUs Not Close to Data: A significant barrier to AI success is the lack of proximity between GPUs (Graphical Processing Units) and data. This means that 49% of companies expect to run AI projects both in-cloud and on-premises by 2025, indicating the need for efficient data movement to where the GPUs are located.
  3. Poor Tech Infrastructure: Existing file and object storage systems lack the performance required to feed GPUs. This limitation in tech infrastructure hampers the ability to efficiently process and analyze data for AI projects.

The Hammerspace Global Data Platform

At the core of Hammerspace's offering is its Global Data Platform. The platform provides a unified solution for managing and accessing data across multiple locations, including edge, core, and cloud environments. With a single namespace and a global parallel file system, organizations can achieve data consistency and accessibility, regardless of where the data resides. This global data platform eliminates data silos and enables seamless data movement and collaboration.

David Flynn explained the platform's unique approach: "This is about introducing a level of indirection - metadata separate from data, introducing the card catalog, and putting the card catalog in charge of how the books are actually laid out on the shelf."

The platform provides a number of key elements:

  1. Hyperscale NAS Architecture: At the core of Hammerspace is its hyperscale NAS architecture. This architecture combines the scalability and performance of a parallel file system with the simplicity of a NAS system. It ensures extreme parallel performance to feed GPUs, enabling efficient data processing and analysis. Additionally, it facilitates the movement of data to GPU resources, optimizing AI workloads.
  2. Data-in-Place Assimilation: Hammerspace introduces a unique approach called data-in-place assimilation. With this feature, data remains in its original location while becoming visible and accessible across the platform. This means that users can access and process data in real-time without disruption. Data-in-place assimilation eliminates the need for time-consuming data migrations, ensuring instant access to files and accelerating workflows.
  3. Data Orchestration: Data orchestration is a critical aspect of Hammerspace Global Data Platform. It automates the placement and management of data across different silos, sites, and clouds. By simplifying data governance and security, Hammerspace streamlines pipelines and workflows, enhancing overall data management efficiency. With programmable metadata, users can define and customize metadata attributes, enabling advanced data services and automation based on metadata-driven rules.
  4. Advanced Data Services: Hammerspace offers a range of advanced data services to further optimize data management. Automated data profiling allows organizations to gain insights into their data, facilitating better decision-making. Real-time monitoring ensures that data and storage are aligned with policy objectives, providing visibility into data mobility. Visualization of data mobilities helps IT professionals understand data movement patterns and optimize data placement accordingly.

S3 Interface: Enabling New AI Pipelines and Workflows

One of the key announcements made during the IT Press Tour was Hammerspace's introduction of S3 interface support. This new feature enables organizations to create more efficient AI pipelines and workflows by integrating S3-compatible data sources into the Hammerspace ecosystem, and provides a standardized and widely adopted interface for ingesting and accessing data.

The S3 interface allows organizations to ingest data from S3 endpoints, edge sites, and other sources into the Hammerspace Global Data Platform. This interface supports various data protocols such as SMB, NFS, and NFS4.2, enabling seamless integration with existing storage systems and cloud storage.

By leveraging the S3 interface, organizations can easily integrate their data sources, process data using local GPU resources, and leverage the capabilities of the Hammerspace platform for data orchestration, automation, and advanced data services.

The S3 interface acts as a bridge between different data sources and the Hammerspace platform, facilitating the efficient and scalable management of data for AI and data-intensive workloads.

The S3 interface is available as part of Hammerspace's standard license at no additional cost. It's currently in early access, with general availability expected in late 2024.

Hyperscale NAS Architecture

Hammerspace's Hyperscale NAS Architecture, launched in February, has been a game-changer. David Flynn explained its significance: "Meta adopting Hammerspace for their LLM the Llama training allowed us to prove the technology out." He further added, "We now have several other large scale clusters, 1,000 node DGX clusters with 8,000 GPUs."

Key features include:

  • Linear scalability from a few nodes to thousands
  • Extreme performance for mixed I/O workloads
  • Ability to span multiple locations (edge-core-cloud)
  • Automated, non-disruptive data movement
  • Standards-based enterprise NAS features
  • Software-defined and storage-agnostic design

Data Orchestration and Metadata-Driven Objectives

In addition to its comprehensive data management capabilities, Hammerspace Global Data Platform offers specific use cases that demonstrate its effectiveness in orchestrating data for edge computing and AI factory environments.

  • Orchestrating Data for Edge Computing: Edge computing has gained significant traction in recent years, enabling organizations to process data closer to the source, reducing latency and improving real-time decision-making. Hammerspace plays a crucial role in orchestrating data for edge computing scenarios. By leveraging its metadata-driven AI workflow and parallel global file system, Hammerspace enables data from multiple sources to be seamlessly integrated and processed at the edge. This ensures that edge devices have access to the most up-to-date and relevant data, enabling faster and more accurate AI inference and analysis. With objective-based data placement and automated data orchestration, Hammerspace optimizes data movement and ensures data consistency across edge sites, enabling organizations to harness the full potential of edge computing.
  • Orchestrating Data for AI Factory: AI factories are environments where organizations develop, train, and deploy AI models at scale. Hammerspace Global Data Platform provides a metadata-driven AI workflow that streamlines the data management process in AI factories. By parallelizing the global file system and leveraging objective-based data placement, Hammerspace enables efficient data processing and consumption for AI workloads. It allows data from disparate sources to be easily integrated, ensuring a single, unified data set for training and inference. With automated data orchestration, Hammerspace simplifies the movement of data across silos, sites, and clouds, enabling seamless collaboration and accelerating AI model development. The platform's advanced data services, such as automated data profiling and real-time monitoring, further enhance the efficiency and effectiveness of AI factories, enabling organizations to achieve faster time-to-market and improved AI model performance.

By leveraging Hammerspace Global Data Platform for orchestrating data in edge computing and AI factory environments, organizations can overcome the challenges of data management, ensure data consistency, and optimize AI workflows. With its metadata-driven approach, parallel file system performance, and automated data orchestration capabilities, Hammerspace empowers organizations to unlock the full potential of edge computing and AI model development, driving innovation and business success.

During our meeting, Molly Presley highlighted a significant use case with GigaIO and Source Code: "They have made supercomputers in a suitcase. That's what they have brought to the table. GigaIO and SourceCode are able to take GPUs to the edge, primarily for government and warfare reasons, where you're capturing field data and you want to process it in the field right now."

Conclusion: Positioning for the Future of AI and Data Management

Hammerspace's latest announcements and innovations position the company as a key player in the evolving landscape of AI infrastructure and data management. By addressing critical pain points such as data silos, performance bottlenecks, and complex data orchestration requirements, Hammerspace is enabling organizations to build more efficient and scalable AI pipelines and workflows. As David Flynn stated, "This is really the opportunity, defining the framework, the language within which people will describe and systems will automate, like the integration we've done with ShotGrid, for example, to have an orchestration platform that can work on behalf of very diverse use cases under different software."

The introduction of S3 interface support, coupled with the company's existing strengths in global file systems and data orchestration, provides a compelling solution for organizations looking to accelerate their AI initiatives and maximize the utilization of their GPU resources.

As the demand for AI and GPU-intensive workloads continues to grow, Hammerspace's focus on data orchestration, performance, and scalability puts it in a strong position to meet the needs of enterprises and research organizations alike. With its comprehensive Global Data Platform and innovative features like data-in-place assimilation and advanced erasure coding, Hammerspace is well-equipped to tackle the data challenges of today and tomorrow's AI-driven world.

As Molly Presley concluded, "Blue Origin's quote was 'they forever changed the way we use unstructured data'," highlighting the transformative impact of Hammerspace's technology.


Published Monday, June 24, 2024 2:45 PM by David Marshall
Filed under: , ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<June 2024>