Announcing Built On Envoy: Making Envoy Extensions Accessible to Everyone

Learn more

What Is an LLM Gateway?

Last updated: June 2026

Definition

An LLM gateway is a routing and management layer that sits between an organization’s applications and the large language model (LLM) providers they use. It gives applications a single endpoint to send LLM requests to, then handles model routing, failover, access control, and cost management on their behalf.

“LLM gateway” and “AI gateway” are often used interchangeably. The term LLM gateway emphasizes routing to large language models specifically, while AI gateway is the broader term that also covers other AI services, agent tool use, and governance.

How an LLM gateway works

Applications send their LLM requests — typically using an OpenAI-compatible API — to the gateway rather than directly to each provider. The gateway then routes the request to the right model, applies access and rate-limiting policy, records usage and cost, and fails over to an alternate provider if the primary is unavailable. Because the application targets one stable endpoint, teams can switch models or providers, enforce limits, and track spend without changing application code.

What an LLM gateway typically provides

  • Unified model access — one OpenAI-compatible interface across many providers and models.
  • Automatic failover — rerouting to a backup model or provider when one fails or rate-limits.
  • Cost management — tracking token usage and enforcing budgets per key, team, or application.
  • Access control — restricting which models, teams, or applications can be used.
  • Observability — logging requests, latency, and spend for analysis.

Why teams use an LLM gateway

Calling LLM providers directly from every application works at small scale but breaks down as usage grows: spend becomes hard to attribute, each app hardcodes its own provider, outages cascade, and there’s no central place to enforce policy. An LLM gateway centralizes all of that, which is why teams adopt one as they move LLM workloads into production.

  • An AI gateway is the broader category — it includes LLM routing plus agent tool governance (such as MCP) and enterprise controls.
  • An inference gateway emphasizes routing to self-hosted or in-cluster model inference.
  • An MCP overview explains how agents access external tools, complementing model routing.

Frequently asked questions

Is an LLM gateway the same as an AI gateway? They’re closely related and often used interchangeably. “LLM gateway” emphasizes routing to large language models; “AI gateway” is the broader term covering LLM routing plus agent tool use and governance. See What is an AI gateway?.

Does an LLM gateway add latency? A well-designed gateway adds minimal overhead relative to the seconds an LLM takes to respond. Performance varies by implementation and the policies enabled.

Can I self-host an LLM gateway? Yes — some are open source and self-hosted, while others are managed services. Many enterprises choose a managed control plane with the data plane deployed in their own environment.


Tetrate’s AI Gateway, part of Tetrate Agent Router, provides unified model access, automatic failover, and cost management built on Envoy AI Gateway. Learn more about the AI Gateway or browse the AI gateway glossary.

Decorative CTA background pattern background background
Tetrate logo in the CTA section Tetrate logo in the CTA section for mobile

Ready to enhance your
network

with more
intelligence?