Announcing Built On Envoy: Making Envoy Extensions Accessible to Everyone

Learn more

Envoy AI Gateway Reaches 1.0: A Stable Foundation for Enterprise AI Traffic

Envoy AI Gateway 1.0 is generally available: the first stable, production-ready release of the open source AI gateway built on CNCF's Envoy Gateway.

Envoy AI Gateway Reaches 1.0: A Stable Foundation for Enterprise AI Traffic

In February 2025, I wrote the announcement for the first release of Envoy AI Gateway. I called it the opening chapter of a journey: a foundation for organizations to adopt GenAI while keeping control, security, and cost in their own hands. Sixteen months and many releases later, that journey has reached the milestone the whole community has been building toward.

Today, Envoy AI Gateway 1.0 is generally available: the first stable, production-ready release of the open source AI gateway built on CNCF’s Envoy Gateway. It arrives a year to the day after v0.2, and it represents something larger than any single feature: an API we are committing to keep stable, running on the same battle-tested Envoy foundation that already moves production traffic at the world’s largest companies.

For the full technical breakdown, see the Envoy AI Gateway release notes.

What 1.0 actually means: a stable foundation

The headline of 1.0 isn’t a feature. It’s a promise.

Envoy AI Gateway’s release policy has always said the project would cut v1.0.0 once it had a first stable control-plane API. That moment is here, and the commitment behind it is deliberately strict:

We will never break the APIs unless there is a critical security issue, and we will always provide a documented migration path if we ever must.

In practice, that means three things for the teams who build on it:

  • Stable CRDs. The resources you author (AIGatewayRoute, AIServiceBackend, BackendSecurityPolicy, GatewayConfig, MCPRoute, and MCPRouteSecurityPolicy) graduate to v1 and won’t break under you.
  • Predictable upgrades. Upgrading the controller won’t break a valid, migrated configuration.
  • Documented migrations. Any future change that requires action ships with a clear upgrade path in the release notes.

For enterprises, this is the part that has been missing from the AI gateway conversation. You can finally standardize on a single, provider-agnostic AI gateway without betting your roadmap on a moving target. That is what Safe looks like at the infrastructure layer: a foundation that doesn’t shift under you while you’re trying to ship.

How far we’ve come: from 0.1 to 1.0

v0.1 put a unified API in front of two providers, with upstream authorization and token-based rate limiting. 1.0 is a different animal. The table below is the clearest way to see the distance the community has covered:

Capabilityv0.1 (Feb 2025)1.0
AI providers2 (OpenAI, AWS Bedrock)16, with cross-provider request/response translation
API surfaceChat completionsChat, completions, embeddings, image generation, audio (transcription / translation / speech), and the OpenAI Responses API
MCP (Model Context Protocol)NoneA full MCP gateway: server multiplexing, tool routing and filtering, and fine-grained authorization
MultimodalNoneImage, audio, and video inputs across supported providers
ObservabilityBasic metricsOpenTelemetry tracing, OpenInference, GenAI token metrics, separate reasoning-token accounting
Multi-tenancy & routingToken rate limitingHostname-based routing, model virtualization, and quota-aware rate limiting
Control-plane APIv1alpha1 (experimental)Stable v1

Those 16 providers all sit behind a single OpenAI-compatible interface: OpenAI, Azure OpenAI, Google Gemini, Google Vertex AI, AWS Bedrock, Anthropic, Mistral, Cohere, Groq, Together AI, DeepInfra, DeepSeek, Hunyuan, SambaNova, Grok, and the Tetrate Agent Router Service. Your application talks to one endpoint while the gateway handles the rest.

What’s in 1.0

One API, every provider

Point your application at a single OpenAI-compatible endpoint and let the gateway handle provider-specific translation, authentication, and routing. Switch or mix providers without touching application code, and use model virtualization to keep that code stable while routing changes underneath:

backendRefs:
  - name: openai-backend
    modelNameOverride: "gpt-4o"
  - name: anthropic-backend
    modelNameOverride: "claude-opus-4"

This is the mechanism behind A/B testing, gradual migrations, multi-provider strategies, and, bluntly, not being locked to a single vendor’s pricing or availability.

Provider authentication, handled at the gateway

BackendSecurityPolicy keeps provider credentials out of your applications and centralizes upstream auth: API keys plus AWS, Azure, and GCP cloud-native identity, including Workload Identity, all managed in one place instead of scattered across every service that calls a model.

An MCP gateway for the agentic era

Agents are only as governable as the tools they can reach. 1.0 ships a production-grade Model Context Protocol gateway: aggregate multiple MCP servers behind one endpoint, filter which tools clients can see with include/exclude rules, forward OAuth 2.0 JWT claims to backends, and enforce CEL-based, fine-grained authorization, so tools/list only ever returns what a caller is actually allowed to use.

Token-aware traffic management

AI traffic doesn’t behave like API traffic, and rate limits measured in requests miss the point. 1.0 attributes cost separately for input, output, cached, and reasoning tokens, scopes those costs per route with fleet-wide defaults, and adds quota-aware routing primitives (the QuotaPolicy API) to steer around rate-limited upstreams. This is where governance and cost control stop being a spreadsheet exercise and become part of the data path.

AI-native observability, built in

Every request emits OpenTelemetry traces using the GenAI semantic conventions, with OpenInference compatibility for evaluation tools like Arize Phoenix, across chat, embeddings, image generation, audio, MCP, and reasoning endpoints. Reasoning tokens are accounted for separately, because on modern models that’s often where the cost actually goes.

Standards all the way down

Envoy AI Gateway is built on the Kubernetes Gateway API and the Gateway API Inference Extension. It’s an additive layer on Envoy Gateway: it expands what Envoy can do for GenAI traffic without changing how you already deploy and operate it.

Built in the open, by a community

1.0 is the work of a genuinely cross-industry community. Maintainers come from Tetrate, Bloomberg, Tencent, and Nutanix, alongside a growing roster of independent contributors who join the weekly community meetings, file issues, and ship code. Just as importantly, the project has been hardened by real production use. Our thanks to Bloomberg, LY Corporation, Alan by Comma Soft, and NRP for sharing the feedback and production insight that shaped it.

Tetrate has been a primary upstream contributor and a driving force on the project since the collaboration with Bloomberg that started it. What matters most to us is that this is open in the way enterprises actually need:

The code in the public repo is the same code running in production at Bloomberg and Tetrate. That level of transparency is rare in open source, and it’s what enterprises need as they scale AI.

Varun Talwar Co-founder and CTO, Tetrate

No enterprise-only fork, no critical features held back behind a license. The gateway you evaluate is the gateway that runs in production at the companies building it.

Where we go from here

A stable API is a starting line, not a finish line. The community roadmap beyond 1.0 includes:

  • A dedicated MCPBackend CRD, decoupling MCP backend configuration from MCPRoute.
  • Deeper MCP authorization and identity: backend security policy for MCP, OIDC token exchange to MCP backends, and finer-grained policy across tools, resources, and prompts.
  • Fuller quota-aware routing that automatically steers around rate-limited upstreams.
  • Dollar-based control, not just tokens: cost governance that moves beyond token counts to actual spend.
  • More provider translation paths and expanded multimodal support.

The roadmap is community-driven, and we’d love your help shaping it.

Running 1.0 with full governance

Envoy AI Gateway gives you a stable, open foundation, and 1.0 is built to be run on your own terms. Some teams want exactly that. Others want the same foundation with governance, failover, and evaluation already wired in and operated for them.

That’s what Tetrate builds on top. Agent Router Enterprise runs on Envoy AI Gateway and adds the guardrails, behavioral metrics, and continuous supervision that move agents from prototype to production with confidence, managed by Tetrate as the project’s co-creator. For teams that need runtime visibility and cost governance across an existing fleet, Agent Operations Director does the same at the platform layer. Either way, the engine underneath is the same open source code: no lock-in, no surprises.

Tetrate Agent Router Enterprise provides continuous runtime governance for GenAI systems. Enforce policies, control costs, and maintain compliance at the infrastructure layer — without touching application code.

Learn more

Get involved

1.0 belongs to everyone who got us here: the maintainers and contributors who wrote the code and the reviews, the early adopters who ran pre-releases in production and told us what broke, and the broader Gateway API, Envoy, and CNCF communities whose standards we build on. The best way to thank them is to join in:

The future of AI infrastructure is open, stable, and community-driven. Sixteen months ago this was an opening chapter. 1.0 is the foundation, and I can’t wait to see what you build on it.

Product background Product background for tablets
Building AI agents

Agent Router Enterprise provides a managed AI Gateway, MCP Gateway, and AI Guardrails in your dedicated instance. Graduate agents from prototype to production with consistent model access, governed tool use, and runtime supervision — built on Envoy AI Gateway by its creators.

  • AI Gateway – Unified model catalog with automatic fallback across providers
  • MCP Gateway – Curated tool access with per-profile authentication and filtering
  • AI Guardrails – Enforce policies, prevent data loss, and supervise agent behavior
  • Learn more
    Replacing NGINX Ingress

    Tetrate Enterprise Gateway for Envoy (TEG) is the enterprise-ready replacement for NGINX Ingress Controller. Built on Envoy Gateway and the Kubernetes Gateway API, TEG delivers advanced traffic management, security, and observability without vendor lock-in.

  • 100% upstream Envoy Gateway – CVE-protected builds
  • Kubernetes Gateway API native – Modern, portable, and extensible ingress
  • Enterprise-grade support – 24/7 production support from Envoy experts
  • Learn more
    Decorative CTA background pattern background background
    Tetrate logo in the CTA section Tetrate logo in the CTA section for mobile

    Ready to enhance your
    network

    with more
    intelligence?