DDN, premier provider of
Artificial Intelligence (AI) and Data Management software and hardware
solutions enabling Intelligent Infrastructure, today announced its
high-performance
storage solutions have helped to support
broader, faster and more
efficient drug discovery research and operations at
Recursion, a digital biology company
industrializing drug discovery through the combination of automation, AI and
machine learning (ML) capabilities to discover novel medicines. Read the
Recursion case study that describes the
challenges, solution and benefits of the infrastructure, and reveals details of
the global-impact work that's taking place in Salt Lake City.
With most of the
industry fighting mounting drug-discovery costs and time-to-market challenges,
Recursion uses new approaches combining scientific and technological
approaches. This requires high-performance drug discovery processing that is
fully optimized for AI and ML and designed to unlock the maximum value of data
from the world's largest repository of biological images.
"Our data is our
company, so we needed a robust storage architecture to support our AI-driven
models," said Kris Howard, principal systems engineer at Recursion. "Managing
our at-scale data needs requires fast ingest, optimized processing and reduced
application run times."
In collaboration
with DDN's domain experts, Recursion initially created a proof of concept,
encompassing DDN's EXAScaler ES400NV and ES7990X
parallel filesystem appliances that were later scaled to 2PBs of capacity for
staging ML models. An all-flash layer was employed as a front-end to the file
system supported by ample spinning disk and the first 64K of each file is
stubbed to this layer, which then accelerates access to the first part of the
data before streaming the rest to spinning disk.
With DDN,
Recursion executes about 350,000 experiments weekly and screens thousands of
compounds against hundreds of disease models, now at a fraction of the cost and
time of traditional drug discovery methods. DDN's hybrid high-performance
scalable storage solutions, fully optimized for AI and ML, have helped to
decrease costs and increase the efficiency of biological research.
"DDN's reputation
as a storage leader is reinforced by our mature solutions and increasing focus
on AI data storage," said Paul Bloch, president and co-founder at DDN.
"Leveraging Intelligent Infrastructure to deliver the most comprehensive set of
data-centric AI-enabled solutions, DDN's flexibility in sizing Recursion's
configuration to meet specific workloads has resulted in robust storage that
seamlessly supports 18 nodes and 136 GPUs. Being trusted as the storage
infrastructure provider for Recursion is a true honor as they work to disrupt
traditional drug discovery methods and identify treatments for disease with
precision and efficiency."
While traditional
storage architectures would not meet Recursion's stringent high-performance
file processing demands, DDN's 2PB high-performance, multi-tier data management
infrastructure has helped to maximize GPU compute resources for accelerated AI
workflows. Not only did this approach deliver extremely fast performance for
Recursion's demanding workloads, it helped to alleviate file-access bottlenecks
while enabling efficient streaming to the GPUs.
"Our DDN storage is wicked fast," says Howard. "The Flash layer resulted
in a 40% reduction in file access time, and we can get our GPUs to 100%
utilization, and keep them pegged there. It's highly unusual to train data off
a PFS, but it's a perfect solution for our use case."