Cloudflare, Inc. announced
powerful new capabilities for Workers AI, the serverless AI platform,
and its suite of AI application building blocks, to help developers
build faster, more powerful and more performant AI applications.
Applications built on Workers AI can now benefit from faster inference,
bigger models, improved performance analytics, and more. Workers AI is
the easiest platform to build global AI applications and run AI
inference close to the user, no matter where in the world they are.
As large language models (LLMs) become smaller and more performant,
network speeds will become the bottleneck to customer adoption and
seamless AI interactions. Cloudflare's globally distributed network
helps to minimize network latency, setting it apart from other networks
that are typically made up of concentrated resources in limited data
centers. Cloudflare's serverless inference platform, Workers AI, now has
GPUs in more than 180 cities around the world, built for global
accessibility to provide low latency times for end users all over the
world. With this network of GPUs, Workers AI has one of the largest
global footprints of any AI platform, and has been designed to run AI
inference locally as close to the user as possible and help keep
customer data closer to home.
"As AI took off last year, no one was thinking about network speeds as a
reason for AI latency, because it was still a novel, experimental
interaction. But as we get closer to AI becoming a part of our daily
lives, the network, and milliseconds, will matter," said Matthew Prince,
co-founder and CEO, Cloudflare. "As AI workloads shift from training to
inference, performance and regional availability are going to be
critical to supporting the next phase of AI. Cloudflare is the most
global AI platform on the market, and having GPUs in cities around the
world is going to be what takes AI from a novel toy to a part of our
everyday life, just like faster Internet did for smartphones."
Cloudflare is also introducing new capabilities that make it the easiest platform to build AI applications with:
-
Upgraded performance and support for larger models: Now,
Cloudflare is enhancing their global network with more powerful GPUs for
Workers AI to upgrade AI inference performance and run inference on
significantly larger models like Llama 3.1 70B, as well as the
collection of Llama 3.2 models with 1B, 3B, 11B (and 90B soon). By
supporting larger models, faster response times, and larger context
windows, AI applications built on Cloudflare's Workers AI can handle
more complex tasks with greater efficiency - thus creating natural,
seamless end-user experiences.
-
Improved monitoring and optimizing of AI usage with persistent logs: New
persistent logs in AI Gateway, available in open beta, allow developers
to store users' prompts and model responses for extended periods to
better analyze and understand how their application performs. With
persistent logs, developers can gain more detailed insights from users'
experiences, including cost and duration of requests, to help refine
their application. Over two billion requests have traveled through AI
Gateway since launch last year.
-
Faster and more affordable queries: Vector databases make it
easier for models to remember previous inputs, allowing machine learning
to be used to power search, recommendations, and text generation
use-cases. Cloudflare's vector database, Vectorize, is now generally
available, and as of August 2024 now supports indexes of up to five
million vectors each, up from 200,000 previously. Median query latency
is now down to 31 milliseconds (ms), compared to 549 ms. These
improvements allow AI applications to find relevant information quickly
with less data processing, which also means more affordable AI
applications.