Introducing Agent Router Enterprise: Managed LLM & MCP Gateways + AI Guardrails in Your Dedicated Instance

Learn more

We Built an AI Agent to Cut Our Cloud Bill in Half

Our cloud bill was attracting board-level attention. Instead of hiring a FinOps team, we built AI agents that scan AWS, GCP, and Azure weekly. Here's what we learned.

We Built an AI Agent to Cut Our Cloud Bill in Half

Our cloud bill had grown to a point where it was attracting board-level attention. The target: cut it in half within a year. That’s not a vague aspiration — it’s a line item in a spreadsheet that someone is accountable for.

We didn’t hire a FinOps team. We built an AI agent.

Tetrate Agent Router Enterprise provides continuous runtime governance for GenAI systems. Enforce policies, control costs, and maintain compliance at the infrastructure layer — without touching application code.

Learn more

This is the first in a series of posts about what we learned building cost optimization agents that operate across AWS, GCP, and Azure. Not the polished conference version — the actual messy reality of building agents that need to find real money in production infrastructure.

The Problem

Cloud cost optimization sounds straightforward until you try to do it at scale. We have dozens of AWS accounts, multiple GCP projects, and Azure subscriptions. Resources spin up for demos, experiments, and customer engagements, and some of them never spin down. People move teams. Projects get deprioritized. The infrastructure stays.

The manual approach — logging into each cloud console, checking utilization dashboards, filing tickets to delete things — doesn’t work when you have this many accounts. An engineer might spend a day auditing one account, find a few hundred dollars in savings, and then never do it again because they have actual work to do.

We needed something that could scan every account, every week, automatically. Something that could reason about whether a resource was actually idle versus just quiet over the weekend. Something that could prioritize findings by impact and track whether humans had already reviewed and dismissed them.

So we built agents.

What We Built

The system is three cloud-specific agents — one each for AWS, GCP, and Azure — plus a shared dashboard and persistence layer. Each agent runs weekly on a schedule, scans its cloud environment, and produces findings: specific resources that are idle, over-provisioned, or misconfigured, with severity ratings and estimated monthly savings.

The agents run on Modal (a serverless compute platform), persist findings to Firestore, and route all LLM calls through a centralized gateway that handles model selection, cost tracking, and observability.

For AWS, the agent has about 16 tools: it can list accounts, pull billing data, analyze EC2 utilization and rightsizing, check for unattached EBS volumes, idle RDS databases, unused Elastic IPs, idle NAT gateways (including data transfer analysis), load balancers with no healthy targets, orphaned ENIs, and over-provisioned EKS clusters. It also reads human-provided context about each account — what it’s for, which team owns it, what’s expected to be running there — so it doesn’t flag intentional infrastructure as waste.

GCP and Azure have equivalent capabilities, adapted for each cloud’s APIs and resource types.

The Four Design Challenges That Shaped Everything

Building the agents was the easy part. Making them reliable, accurate, and useful enough that people actually trusted the output — that was harder. Four problems dominated our thinking:

1. How much should the LLM control?

Our first instinct was to let the LLM orchestrate everything. Give it tools, give it a system prompt, let it figure out which accounts to check, which resources to analyze, and what to report. This worked, but it had problems: the LLM would sometimes skip accounts, over-focus on one area, or make inconsistent prioritization decisions across runs.

We ended up with two different architectural patterns across our agents, and the tension between them taught us a lot about where LLMs add value versus where deterministic code is more reliable. (We’ll cover this in a future post.)

2. Context windows are finite, but cloud environments aren’t

An agent analyzing 20+ AWS accounts generates a lot of data. Tool call results pile up. The context window fills. We needed findings to survive even if the agent hit its token limit or timed out mid-run.

This led us to an auto-save pattern where findings are persisted to the database as they’re discovered, not collected and saved at the end. It also meant using stable, deterministic IDs for findings (based on cloud provider + account + resource) so that the same finding doesn’t get duplicated across weekly runs, and human decisions to dismiss a finding are preserved. These small design choices turned out to be critical for making agents work in production. (More on this in a future post.)

3. Not everything needs an LLM

We built another agent for compliance monitoring that syncs with a governance platform. The first version used an LLM to orchestrate the sync. Then we built a direct sync path that bypasses the LLM entirely — and it runs faster, costs less, and produces identical results.

The lesson: LLMs are great at judgment calls (is this resource really idle, or just quiet?), prioritization (which findings matter most?), and natural language (writing recommendations). They’re wasteful for data fetching, transformation, and CRUD operations. Knowing when to remove the AI from your AI agent is an underrated skill. (Future post.)

4. Where should intelligence live?

Some capabilities belong in the agent: domain logic, tool orchestration, context management. Others belong in the infrastructure layer: LLM routing, API key management, cost tracking, rate limiting, PII detection. We learned this the hard way by initially building capabilities in the wrong layer and then migrating them.

We now route all LLM calls through a centralized gateway with per-agent API keys, which gives us cost attribution across agents without any agent-level code for tracking spend. The agent just calls the model; the infrastructure handles the rest. (We’ll cover the full agent-vs-middleware framework in a future post.)

The Humbling Part

After building all of this, we ran the agent, reviewed the findings, and celebrated. It was finding idle resources, flagging waste, generating actionable recommendations.

Then we did the math.

The agent was identifying a few thousand dollars per month in savings. Against a six-figure monthly AWS bill, that’s a 2.4% catch rate.

The agent worked. It just wasn’t working hard enough. The gap between “agent runs successfully” and “agent finds meaningful savings” turned out to be enormous. What followed was a systematic gap analysis that reshaped the entire system — which is the subject of the next post.

What’s Coming

This series will cover the design decisions, architectural trade-offs, and lessons learned from building these agents. Each post focuses on one specific challenge and how we solved it (or didn’t):

  • Next up: The gap analysis — what the agent was missing and what we changed
  • Two architectural patterns: LLM-orchestrated vs. two-phase discovery, and why we use both
  • Reliability in production: Auto-save, stable IDs, and respecting human decisions
  • Not everything needs an LLM: When to remove AI from your AI agent
  • Agent vs. middleware: A framework for deciding where intelligence should live

If you’re building agents for operational automation — cost optimization, compliance, security, infrastructure management — the problems we hit are the same ones you’ll hit. Hopefully our mistakes save you some time.


Agent Router Enterprise provides the infrastructure layer we use to manage these agents in production: centralized LLM routing with per-agent cost attribution through the LLM Gateway, governed tool connectivity through the MCP Gateway, and continuous supervision through AI Guardrails. When your agent portfolio grows beyond one or two experiments, the infrastructure matters. Learn more here ›

Product background Product background for tablets
New to service mesh?

Get up to speed with free online courses at Tetrate Academy and quickly learn Istio and Envoy.

Learn more
Using Kubernetes?

Tetrate Enterprise Gateway for Envoy (TEG) is the easiest way to get started with Envoy Gateway for production use cases. Get the power of Envoy Proxy in an easy-to-consume package managed via the Kubernetes Gateway API.

Learn more
Getting started with Istio?

Tetrate Istio Subscription (TIS) is the most reliable path to production, providing a complete solution for running Istio and Envoy securely in mission-critical environments. It includes:

  • Tetrate Istio Distro – A 100% upstream distribution of Istio and Envoy.
  • Compliance-ready – FIPS-verified and FedRAMP-ready for high-security needs.
  • Enterprise-grade support – The ONLY enterprise support for 100% upstream Istio, ensuring no vendor lock-in.
  • Learn more
    Need global visibility for Istio?

    TIS+ is a hosted Day 2 operations solution for Istio designed to streamline workflows for platform and support teams. It offers:

  • A global service dashboard
  • Multi-cluster visibility
  • Service topology visualization
  • Workspace-based access control
  • Learn more
    Decorative CTA background pattern background background
    Tetrate logo in the CTA section Tetrate logo in the CTA section for mobile

    Ready to enhance your
    network

    with more
    intelligence?