Cerebras and Hugging Face announced a new partnership to bring
Cerebras Inference to the Hugging Face platform. HuggingFace has
integrated Cerebras into HuggingFace Hub, bringing the world's fastest
inference to over five million developers on HuggingFace. Cerebras
Inference runs the industry's most popular models at more than 2,000
tokens/s - 70x faster than leading GPU solutions. Cerebras Inference
models including Llama 3.3 70B, will be available to HuggingFace
developers, enabling seamless API access to Cerebras CS-3 powered AI
models.
Cerebras recently announced industry-leading speeds for Llama 3.3 70B,
achieving over 2,200 tokens per second - 70 times faster than GPU-based
solutions. Leading industry models like OpenAI o3-mini take minutes to
generate reasoning answers - Cerebras Inference completes the same tasks
at comparable accuracy in mere seconds.
"We're excited to partner with Hugging Face to bring our
industry-leading inference speeds to the global developer community,"
said Andrew Feldman, CEO, Cerebras. "By making Cerebras Inference
available through Hugging Face, we're empowering developers to work
faster and more efficiently with open-source AI models, unleashing the
potential for even greater innovation across industries."
For the 5 million Hugging Face developers already using the Inference
API, this new integration makes it easier than ever to switch to a
faster provider for these popular open-source models. Developers can
simply select "Cerebras" as their Inference Provider of choice in the
Hugging Face platform.
Why Fast and Accurate Open-Source AI Inference Matters
Fast and precise AI inference is essential for a variety of
applications, particularly as demand for higher token counts in
inference increases with test-time compute and Agentic AI. Open-source
models enable Cerebras to optimize these models for the CS-3, delivering
developers faster and more accurate inference speeds- ranging from 10
to 70 times faster performance than GPUs.
"Cerebras has been a leader in inference speed and performance, and
we're thrilled to partner to bring this industry-leading inference on
open-source models to our developer community," said Julien Chaumond,
CTO of Hugging Face.