By Mingming Zhou, Senior Software Engineer, DaoCloud
and Simon YN Zhao, Senior Solution Architect, DaoCloud
HwameiStor is a software defined cloud-native local storage
system that combines the advantages of both local disks and commercial storage
products, cost-effective, high-performance, and rich enterprise-level data
management functions. Designed & developed for Kubernetes, HwameiStor has
the following features:
- Simple O&M: One-click containerization deployment with
minimum server resources, automatic O&M
- Elasticity:Storage resources
(volumes, disks, nodes)dynamically scale on demand to
support large-scale applications
- Data management: Multiple volume types for HA, snapshots,
cloning, restoration, failover, and one-click eviction data migration.
Use case for LLM Pre-Training
HwameiStor provides a simple efficient way for accelerating
datasets access (read) and CKPT quick save (write): Object Storage + Local
Storage
HwameiStor will first load the remote dataset to object
storage nodes in the cluster, then move the data to training nodes local
storage for training. For the subsequent
same datasets loading during training,HwameiStor
provides local storage access either by scheduling training tasks or reloading
datasets.
In CKPT quick save scenario, leverage Memory and NVMe as the
underlying storage to accelerate data writes;guarantee
the sharing and security of CKPT data by asynchronously writing to object
storage within the cluster.
In this solution, the object storage is used as the datasets
second-level cache in the cluster, HwameiStor provides local storage for the
training phase. The advantages are: simple and easy to deploy; minimum data
loading overhead; best performance with local data access; high stability
without additional complexity and failure points.
Use case for infrastructure workloads
Middleware (Message Queue, Kafka, MySQL) and cloud native
virtualization (KubeVirt) scenarios requires the underlying storage with data
durability, performance, HA, security, and some key capabilities to support
virtual machine migration, snapshots, cloning, recovery, etc.
In edge computing like K3s, KubeEdge environment, data and
services are distributed to the edge side, with extremely limited server
resources, and the network is also unreliable. The underlying storage must be
able to provide stable storage with a small resource footprint.
In order to meet the above storage requirements, HwameiStor
adopts disk management technology of pooling, with underlying multiple disk
types of NVMe, SSD, HDD, to provide high-performance local storage volumes for
upper-layer applications. HwameiStor also provides a HA data volume type, and
services like snapshot, cloning, and recovery to help the users achieve the
business continuity goal to meet the requirements for data security,
reliability, and high availability.
Advanced features for production:
- QoS: Volume-based I/O rate limiting for performance
stability
- Data migration: Manual or auto data volumes migration for
faulty
- Audit logs: Usage and operation logs at clusters, nodes,
and data volumes levels
- Dynamic expansion: Manually or auto data volumes, disks,
nodes expansion
- Management tools: UI and command-line management and
O&M tools
- Data services: Business continuity tools of snapshots,
cloning, and other services
HwameiStor is a CNCF Sandbox project: https://github.com/hwameistor. To learn more about HwameiStor, stay tuned
for KubeCon Europe 2024.
##
ABOUT THE AUTHORS
Mingming Zhou is a HwameiStor Developer & Maintainer
Simon YN Zhao is a HwameiStor Maintainer with 20 years
storage and 5 years cloud native experience, used to work for HDS, Sun, EMC.