Envoy AI Gateway

Envoy AI Gateway Reaches 1.0: A Stable Foundation for Enterprise AI Traffic

Envoy AI Gateway 1.0 is generally available: the first stable, production-ready release of the open source AI gateway built on CNCF's Envoy Gateway.

Erica Hughberg

June 23, 2026

Envoy AI Gateway Reaches 1.0: A Stable Foundation for Enterprise AI Traffic

In February 2025, I wrote the announcement for the first release of Envoy AI Gateway. I called it the opening chapter of a journey: a foundation for organizations to adopt GenAI while keeping control, security, and cost in their own hands. Sixteen months and many releases later, that journey has reached the milestone the whole community has been building toward.

Today, Envoy AI Gateway 1.0 is generally available: the first stable, production-ready release of the open source AI gateway built on CNCF’s Envoy Gateway. It arrives a year to the day after v0.2, and it represents something larger than any single feature: an API we are committing to keep stable, running on the same battle-tested Envoy foundation that already moves production traffic at the world’s largest companies.

For the full technical breakdown, see the Envoy AI Gateway release notes.

What 1.0 actually means: a stable foundation

The headline of 1.0 isn’t a feature. It’s a promise.

Envoy AI Gateway’s release policy has always said the project would cut v1.0.0 once it had a first stable control-plane API. That moment is here, and the commitment behind it is deliberately strict:

We will never break the APIs unless there is a critical security issue, and we will always provide a documented migration path if we ever must.

In practice, that means three things for the teams who build on it:

Stable CRDs. The resources you author (AIGatewayRoute, AIServiceBackend, BackendSecurityPolicy, GatewayConfig, MCPRoute, and MCPRouteSecurityPolicy) graduate to v1 and won’t break under you.
Predictable upgrades. Upgrading the controller won’t break a valid, migrated configuration.
Documented migrations. Any future change that requires action ships with a clear upgrade path in the release notes.

For enterprises, this is the part that has been missing from the AI gateway conversation. You can finally standardize on a single, provider-agnostic AI gateway without betting your roadmap on a moving target. That is what Safe looks like at the infrastructure layer: a foundation that doesn’t shift under you while you’re trying to ship.

How far we’ve come: from 0.1 to 1.0

v0.1 put a unified API in front of two providers, with upstream authorization and token-based rate limiting. 1.0 is a different animal. The table below is the clearest way to see the distance the community has covered:

Capability	v0.1 (Feb 2025)	1.0
AI providers	2 (OpenAI, AWS Bedrock)	16, with cross-provider request/response translation
API surface	Chat completions	Chat, completions, embeddings, image generation, audio (transcription / translation / speech), and the OpenAI Responses API
MCP (Model Context Protocol)	None	A full MCP gateway: server multiplexing, tool routing and filtering, and fine-grained authorization
Multimodal	None	Image, audio, and video inputs across supported providers
Observability	Basic metrics	OpenTelemetry tracing, OpenInference, GenAI token metrics, separate reasoning-token accounting
Multi-tenancy & routing	Token rate limiting	Hostname-based routing, model virtualization, and quota-aware rate limiting
Control-plane API	v1alpha1 (experimental)	Stable v1

Those 16 providers all sit behind a single OpenAI-compatible interface: OpenAI, Azure OpenAI, Google Gemini, Google Vertex AI, AWS Bedrock, Anthropic, Mistral, Cohere, Groq, Together AI, DeepInfra, DeepSeek, Hunyuan, SambaNova, Grok, and the Tetrate Agent Router Service. Your application talks to one endpoint while the gateway handles the rest.

What’s in 1.0

One API, every provider

Point your application at a single OpenAI-compatible endpoint and let the gateway handle provider-specific translation, authentication, and routing. Switch or mix providers without touching application code, and use model virtualization to keep that code stable while routing changes underneath:

backendRefs:
  - name: openai-backend
    modelNameOverride: "gpt-4o"
  - name: anthropic-backend
    modelNameOverride: "claude-opus-4"

This is the mechanism behind A/B testing, gradual migrations, multi-provider strategies, and, bluntly, not being locked to a single vendor’s pricing or availability.

Provider authentication, handled at the gateway

BackendSecurityPolicy keeps provider credentials out of your applications and centralizes upstream auth: API keys plus AWS, Azure, and GCP cloud-native identity, including Workload Identity, all managed in one place instead of scattered across every service that calls a model.

An MCP gateway for the agentic era

Agents are only as governable as the tools they can reach. 1.0 ships a production-grade Model Context Protocol gateway: aggregate multiple MCP servers behind one endpoint, filter which tools clients can see with include/exclude rules, forward OAuth 2.0 JWT claims to backends, and enforce CEL-based, fine-grained authorization, so tools/list only ever returns what a caller is actually allowed to use.

Token-aware traffic management

AI traffic doesn’t behave like API traffic, and rate limits measured in requests miss the point. 1.0 attributes cost separately for input, output, cached, and reasoning tokens, scopes those costs per route with fleet-wide defaults, and adds quota-aware routing primitives (the QuotaPolicy API) to steer around rate-limited upstreams. This is where governance and cost control stop being a spreadsheet exercise and become part of the data path.

AI-native observability, built in

Every request emits OpenTelemetry traces using the GenAI semantic conventions, with OpenInference compatibility for evaluation tools like Arize Phoenix, across chat, embeddings, image generation, audio, MCP, and reasoning endpoints. Reasoning tokens are accounted for separately, because on modern models that’s often where the cost actually goes.

Standards all the way down

Envoy AI Gateway is built on the Kubernetes Gateway API and the Gateway API Inference Extension. It’s an additive layer on Envoy Gateway: it expands what Envoy can do for GenAI traffic without changing how you already deploy and operate it.

Built in the open, by a community

1.0 is the work of a genuinely cross-industry community. Maintainers come from Tetrate, Bloomberg, Tencent, and Nutanix, alongside a growing roster of independent contributors who join the weekly community meetings, file issues, and ship code. Bloomberg has been part of the project from the start, and its engineering contributions and influence on the project’s direction run throughout 1.0. Just as importantly, the project has been hardened by real-world use, testing, and feedback. Our thanks to LY Corporation, Alan by Comma Soft, and NRP for the testing and insight that shaped it.

Tetrate has been a primary upstream contributor and a driving force on the project since the collaboration with Bloomberg that started it. What matters most to us is that this is open in the way enterprises actually need:

The code in the public repo is the same code we run ourselves at Tetrate. That’s the transparency enterprises need as they scale AI.

Varun Talwar Co-founder and CTO, Tetrate

No enterprise-only fork, no critical features held back behind a license. The gateway you evaluate is the same gateway its maintainers build on and use themselves.

Where we go from here

A stable API is a starting line, not a finish line. The community roadmap beyond 1.0 includes:

A dedicated MCPBackend CRD, decoupling MCP backend configuration from MCPRoute.
Deeper MCP authorization and identity: backend security policy for MCP, OIDC token exchange to MCP backends, and finer-grained policy across tools, resources, and prompts.
Fuller quota-aware routing that automatically steers around rate-limited upstreams.
Dollar-based control, not just tokens: cost governance that moves beyond token counts to actual spend.
More provider translation paths and expanded multimodal support.

The roadmap is community-driven, and we’d love your help shaping it.

Running 1.0 with full governance

Envoy AI Gateway gives you a stable, open foundation, and 1.0 is built to be run on your own terms. Some teams want exactly that. Others want the same foundation with governance, failover, and evaluation already wired in and operated for them.

That’s what Tetrate builds on top. Agent Router Enterprise runs on Envoy AI Gateway and adds the guardrails, behavioral metrics, and continuous supervision that move agents from prototype to production with confidence, managed by Tetrate as the project’s co-creator. For teams that need runtime visibility and cost governance across an existing fleet, Agent Operations Director does the same at the platform layer. Either way, the engine underneath is the same open source code: no lock-in, no surprises.

Get involved

1.0 belongs to everyone who got us here: the maintainers and contributors who wrote the code and the reviews, the early adopters who tested pre-releases and told us what broke, and the broader Gateway API, Envoy, and CNCF communities whose standards we build on. The best way to thank them is to join in:

Try 1.0 today: download the release and follow the getting started guide.
Explore the examples: real-world configurations to build from.
Join the community: weekly meetings, GitHub Discussions, and the #envoy-ai-gateway channel on Envoy Slack.
Star the repo: github.com/envoyproxy/ai-gateway, a small thing that helps more teams find the project.

The future of AI infrastructure is open, stable, and community-driven. Sixteen months ago this was an opening chapter. 1.0 is the foundation, and I can’t wait to see what you build on it.

Erica Hughberg

June 23, 2026

Building AI agents

Agent Router Enterprise provides a managed AI Gateway, MCP Gateway, and AI Guardrails in your dedicated instance. Graduate agents from prototype to production with consistent model access, governed tool use, and runtime supervision — built on Envoy AI Gateway by its creators.

AI Gateway – Unified model catalog with automatic fallback across providers

MCP Gateway – Curated tool access with per-profile authentication and filtering

AI Guardrails – Enforce policies, prevent data loss, and supervise agent behavior

Learn more

Replacing NGINX Ingress

Tetrate Enterprise Gateway for Envoy (TEG) is the enterprise-ready replacement for NGINX Ingress Controller. Built on Envoy Gateway and the Kubernetes Gateway API, TEG delivers advanced traffic management, security, and observability without vendor lock-in.

100% upstream Envoy Gateway – CVE-protected builds

Kubernetes Gateway API native – Modern, portable, and extensible ingress

Enterprise-grade support – 24/7 production support from Envoy experts

Learn more

Announcing token brokering for cost control in Tetrate Agent Router Enterprise

Envoy AI Gateway Reaches 1.0: A Stable Foundation for Enterprise AI Traffic

What 1.0 actually means: a stable foundation

How far we’ve come: from 0.1 to 1.0