Bloomberg and Tetrate have
delivered on their intentions, first announced in October
2024, to develop an innovative, community-led set of core AI gateway features
for enterprise AI integration. Today, the community partners announced that the
first stable version (v0.1) of the open source Envoy AI Gateway project is
now available for download on GitHub, so it can be used by developers as part
of their enterprise AI application infrastructure.
The first open source AI gateway project to be backed by the Cloud
Native Computing Foundation (CNCF), Envoy AI Gateway democratizes AI
infrastructure for organizations of all sizes and provides an API interface for
developers who are integrating applications with generative AI services. The
Envoy AI Gateway meets the needs of the enterprise in terms of robustness,
scalability and adaptability in an ever-changing generative AI (GenAI)
landscape.
This open source initiative is a response to the challenges that
enterprises face in terms of adopting and integrating AI into their
applications at scale. By laying the groundwork for scalable AI platforms,
Tetrate and Bloomberg engineers are addressing the immediate needs of today's
enterprises and setting the stage for the future of AI applications within
cloud-native environments.
Envoy AI Gateway builds on the capabilities of the CNCF's Envoy Gateway
project, one of the Kubernetes Gateway API implementations. It taps
Envoy's robust, scalable foundation and enables organizations to integrate
modern GenAI functionality into their workflows and applications. The Envoy AI
Gateway routes requests to multiple AI service providers and models through a
single reverse proxy layer and provides a single, unified API layer with which
developers interact.
The initial release of the Envoy AI Gateway provides the following
functionality:
- Unified
API
to simplify client integration with multiple LLM providers, offering a
seamless interface. Version 0.1 includes integrations with AWS Bedrock and
OpenAI.
- Upstream
Authorization
to simplify sign-in with multiple LLM service providers through
easy-to-configure and manage credentials.
- Usage
Rate Limiting
based on word tokens, ensuring cost-effectiveness and operational control.
Token rates can be limited by LLM provider, customized per model or
tailored to each client for a defined time period.
"Envoy is rapidly becoming the community of choice for AI
innovation," said Varun Talwar, Founder of Tetrate. "Tetrate is actively
working not only to contribute to Envoy, but also to build on top of the
project and help organizations deliver GenAI projects faster and more reliably,
and to maximize their ROI in the process. The availability of Envoy AI Gateway
as an alternative to Python gateways is a big step forward for the industry."
Envoy AI Gateway to help Bloomberg scale its GenAI application
development
Envoy AI Gateway is being used by Bloomberg to build generative AI
applications that interact with GenAI services - whether on-prem or in the
cloud - at scale. The Gateway gives Bloomberg a central place to manage the
usage of GenAI services through a consistent, unified API - regardless of
provider - by setting limits and quotas, and consistently enforcing access
control to GenAI services across the company's AI infrastructure. This approach
simplifies generative AI application development, and will help the company's
engineers create innovative AI services faster for Bloomberg users.
"Contributing to and building with open source, open standard
solutions is something we value and invest in at Bloomberg. Envoy AI Gateway
will enable Bloomberg to equip its engineers with the infrastructure needed to
deliver generative AI applications quickly and at scale," said Steven Bower,
Manager of Bloomberg's Cloud Native Compute Services Engineering group. "We've
collaborated with Tetrate to bring this project to its first stable version,
and we're excited to share this innovative enterprise AI solution with the CNCF
community."
What's next for the Envoy AI Gateway project
Community organizers have already identified several new features
that are now on the project roadmap, including:
- Google
Gemini 2.0 Integration out-of-the-box
- Provider
and Model Fallback Logic to ensure continuation of services should
an AI service become temporarily unavailable
- Prompt
Templating
to provide consistent context to the LLM service across requests
- Semantic
Caching
to lower LLM usage costs by reusing responses from semantically similar
requests, thereby minimizing expensive LLM interactions
Project Origins
The initial idea for the Envoy AI Gateway project arose when Dan
Sun, Engineering Team Lead for Bloomberg's Cloud Native Compute Services - AI
Inference team and co-founder/maintainer of the KServe project,
described to the Envoy community the enterprise need for an internal AI
platform built on open source technologies, namely Envoy and Kubernetes.
Tetrate, a major upstream contributor to the Envoy project, stepped forward to
help turn the vision for the Envoy AI Gateway API into reality. Read more about the origins of the Envoy AI
Gateway project.