As part of the 56th edition of the IT Press Tour event held in
California this week, VMblog had the opportunity to attend a briefing by
Rui Su, co-founder of Juicedata, the company behind the innovative
JuiceFS distributed file system. Su provided an in-depth look at how
JuiceFS addresses the limitations of traditional storage solutions when
dealing with demanding data-intensive workloads.
The company was
founded in 2017 by Davies Liu and Rui Su. Liu is a seasoned developer
with contributions to projects like MooseFS, BeansDB, and DPark, and
experience at companies like Meta and Databricks. Su brings a diverse
background, having been a startup founder, tech lead, and engineer at
companies like Douban Movie and Maxthon Browser.
The Need for an Elastic, High-Performance File System
Su began by highlighting the persistent challenges organizations face
with existing file systems and object stores like S3. While convenient
for web services, these solutions often lack the requisite POSIX
compatibility, elasticity, and high throughput to efficiently handle
AI/ML, big data, and other data-intensive use cases.
Key pain points include the inability to scale performance
independently from capacity, inefficient management of small files so
prevalent in the AI domain, and restricted data access across
multi-cloud and hybrid environments. These bottlenecks, according to Su,
have held back enterprises from fully realizing the potential of their
data-driven initiatives.
The JuiceFS Architecture: Designed for Massive Data and High
Performance
To tackle these challenges head-on, Juicedata has developed JuiceFS, a
POSIX-compliant distributed file system with strong consistency
guarantees and support for thousands of concurrent clients and mixed
workloads. At its core is a novel architecture that disaggregates
storage performance from capacity while enabling independent scaling of
throughput.
"With JuiceFS, you can leverage a multi-layer caching strategy,
utilizing local storage on compute nodes, as well as dedicated cache
pools, to accelerate performance," Su explained. "Meanwhile, an object
store backend ensures data durability, allowing you to decouple these
critical aspects cost-effectively."
Another standout feature is JuiceFS' native support for multi-cloud
and hybrid cloud environments through transparent data replication
across regions and on-premises deployments. This flexibility, Su
emphasized, is crucial in an era where data and compute resources are
increasingly distributed.
Community and Enterprise Flavors for Diverse Needs
During the briefing, Su discussed JuiceFS' availability in both
Community (open source) and Enterprise editions, tailored to different
user requirements.
The Community Edition, with over 10,000 GitHub stars, targets
general-purpose distributed file system needs, emphasizing ease of use,
maintenance, and customization. In contrast, the Enterprise Edition is
optimized for the most demanding data-intensive workloads, offering
advanced capabilities like intelligent caching and granular performance
tuning.
Proven Success Across Data-Intensive Industries
Su shared several compelling case studies that demonstrate JuiceFS'
real-world impact across sectors like generative AI, autonomous driving,
quantitative trading, and biotechnology.
One LLM (Large Language Model) startup achieved over 300 GBps
throughput on a hybrid cloud setup for model pre-training using JuiceFS.
An autonomous driving company relies on JuiceFS to handle 20 billion
files, sustaining 450,000 metadata operations per second and 70 GiBps
peak throughput with sub-millisecond latencies.
A biotech leader leveraging JuiceFS for genomic pipelines reported
75% improved performance and 60% cost reduction compared to their
previous solution.
Performance and Cost Advantages
Benchmarks and customer deployments position JuiceFS as a
high-performance, cost-effective alternative to traditional offerings
like S3FS/GoofyS, AWS EFS, and AWS FSx for Lustre. According to Su, while AWS' managed
Lustre service can cost upwards of $0.60/GB-month, excluding metadata
and data transfer fees, JuiceFS' pricing starts at just $0.04/GB-month
through a combination of the Juicedata service fee and underlying cloud
storage costs.
Perhaps more importantly, JuiceFS offers unlimited throughput
scalability, simply by adding more caching resources, whereas Lustre's
aggregate throughput is capped at 1000 MB/s per tebibyte of provisioned
storage.
The Road Ahead
As the briefing concluded, Su outlined Juicedata's roadmap, including
continuous investment in R&D, growing the JuiceFS community presence
in North America, and building out dedicated marketing and sales
capabilities to better serve enterprises globally.
With its unique architecture, proven performance benefits, and a
flourishing open-source community, JuiceFS from Juicedata is emerging as
a compelling solution for organizations seeking to unlock the full
potential of their data-intensive workloads across AI, big data, and
beyond.
##