Announcing Built On Envoy: Making Envoy Extensions Accessible to Everyone

Learn more

Your Agent Found 2.4 Percent of the Savings. Now What?

We built a cost optimization agent. It worked. Then we did the math: it was catching 2.4 percent of the savings. Here's what was missing and what we changed.

Your Agent Found 2.4 Percent of the Savings. Now What?

We built a cost optimization agent, pointed it at our AWS environment, and it found a few thousand dollars per month in savings. We were pleased with ourselves for about a day.

Then someone divided that number by our actual monthly AWS spend and the celebration ended.

2.4%.

Tetrate Agent Router Enterprise provides continuous runtime governance for GenAI systems. Enforce policies, control costs, and maintain compliance at the infrastructure layer — without touching application code.

Learn more

The agent was running. The findings were real. Idle NAT gateways, unused RDS instances, unattached EBS volumes — all legitimate waste. But as a percentage of total spend, we were barely scratching the surface. The question wasn’t whether the agent worked. The question was why it wasn’t finding the other 97.6%.

The Audit

We did what you should do with any agent in production: we audited it. Not “does it run without errors” — that’s the easy bar. We asked: “For every dollar of cloud spend, can the agent see it, analyze it, and make a recommendation about it?”

The answer was sobering.

The Biggest Blind Spot Was a One-Line Filter

Over half of our AWS spend lived in accounts prefixed with “fs” (our DoIt FlexSave billing accounts). The agent was configured to skip these entirely on the assumption that they were billing artifacts with no real resources.

That assumption was wrong. FlexSave is a flexible savings program — no commitment, no reservation. The underlying EC2 instances exist in regular accounts, and reducing their size directly reduces FlexSave costs. By excluding these accounts from analysis, we’d made the single largest category of spend invisible to the agent.

This wasn’t an LLM problem. It was a filter in our tooling code that nobody had questioned. Line 166 of tools.py, an innocent-looking condition that was silently discarding 55% of our AWS spend from analysis.

No Rightsizing Analysis

The agent could find idle EC2 instances — machines running at less than 5% CPU. But the much bigger opportunity was rightsizing: instances running at 10-30% CPU that should be downsized one or two instance sizes. The difference between “idle” and “over-provisioned” was the difference between pocket change in Elastic IP savings and potentially 20-40% of your EC2 bill.

We were looking for the wrong thing. Finding completely idle resources is the easy, low-value version of cost optimization. The high-value version is analyzing utilization patterns and recommending smaller instances — which requires pulling CloudWatch metrics, understanding usage patterns across time, and making judgment calls about what constitutes “over-provisioned.”

No GCP. At All.

About 25% of our total cloud spend was on GCP. The agent didn’t analyze any of it. We’d built an AWS agent and called it a cost agent.

Missing the Expensive Stuff

Even within AWS, the agent was focused on finding idle resources when the real costs were in categories it couldn’t see:

NAT gateway data transfer. The agent could find idle NAT gateways (no traffic), but the actual cost driver is data transfer at $0.045 per gigabyte through active gateways. One production account was spending over a thousand dollars a month on VPC costs — primarily NAT data transfer — and the agent had nothing to say about it, because the gateways were technically “in use.”

EKS cluster efficiency. No visibility into Kubernetes node utilization. Were the node pools right-sized? Were we running three m5.4xlarge nodes when two m5.2xlarge nodes would do? The agent couldn’t tell us because it had no EKS analysis capability.

“EC2-Other” costs. In most accounts, 40-87% of spend was categorized as “EC2-Other” — a catch-all that includes EBS snapshots, data transfer, NAT gateway charges, and other items. The agent wasn’t decomposing this category, which meant the largest line item in most accounts was a black box.

What We Changed

The gap analysis became the roadmap. Each gap became a work item, prioritized by estimated savings impact.

We fixed the FlexSave exclusion — the biggest single improvement was a one-line code change. Over half of our AWS spend that had been invisible became visible overnight.

We added EC2 rightsizing with auto-save per finding. The agent now pulls CloudWatch CPU metrics, identifies instances running at 10-30% utilization, and recommends specific downsizing steps with estimated savings. This is where the largest per-finding dollar amounts come from.

We built a GCP agent from scratch — and this is where things got architecturally interesting. Instead of copying the AWS agent’s design (LLM orchestrates everything), we built a two-phase system: deterministic Python code discovers resources in parallel across all projects, then a single LLM call assesses and prioritizes the findings. The GCP agent is 10-20x faster and cheaper than the AWS pattern. (More on why in a future post.)

We added NAT data transfer analysis. The agent now looks at traffic patterns through active NAT gateways and recommends VPC Gateway Endpoints for S3 and DynamoDB traffic — which is free and often accounts for 30-50% of NAT data transfer.

We added EKS cluster analysis, including node pool utilization and over-provisioning detection for accounts with Container Insights enabled.

The Meta-Lesson

The uncomfortable truth about AI agents is that “it works” is a dangerously low bar.

Our agent worked from day one. It found real waste. It produced accurate recommendations. If we’d evaluated it on standard agent metrics — task completion, tool use accuracy, hallucination rate — it would have scored well. But it was addressing 2.4% of the problem space.

The gap analysis taught us three things:

Your agent’s blind spots are your blind spots. The agent couldn’t analyze what we hadn’t given it tools to see. The FlexSave bug, the missing GCP coverage, the absent rightsizing analysis — these weren’t LLM failures. They were failures of scope and tooling. The LLM can only be as good as the tools and data you give it.

“Runs successfully” is not “finds meaningful value.” We had comprehensive error handling, structured logging, and weekly automated runs. The agent was reliable. It just wasn’t useful enough. Reliability and utility are different things, and most agent evaluation frameworks focus on the former.

Honest evaluation requires doing the math. Not “did the agent find things?” but “what percentage of the findable things did the agent find?” That requires knowing the denominator — total spend, total resources, total opportunity — which means doing the boring work of understanding the problem space independently of the agent.

We’re now running the same gap analysis quarterly. Every time, we find new blind spots. The catch rate keeps going up. It’s still not 100%. But the process of honestly evaluating what the agent misses is more valuable than any individual finding the agent produces.


Agent Router Enterprise helps teams move agents from prototype to production with confidence. Behavioral metrics and guardrail scoring measure agent readiness objectively — so you know whether your agent is actually solving the problem, not just running without errors. Built on the battle-hardened Envoy AI Gateway with continuous supervision for drift detection as your environment changes. Learn more here ›

Product background Product background for tablets
Building AI agents

Agent Router Enterprise provides managed LLM & MCP Gateways plus AI Guardrails in your dedicated instance. Graduate agents from prototype to production with consistent model access, governed tool use, and runtime supervision — built on Envoy AI Gateway by its creators.

  • LLM Gateway – Unified model catalog with automatic fallback across providers
  • MCP Gateway – Curated tool access with per-profile authentication and filtering
  • AI Guardrails – Enforce policies, prevent data loss, and supervise agent behavior
  • Learn more
    Replacing NGINX Ingress

    Tetrate Enterprise Gateway for Envoy (TEG) is the enterprise-ready replacement for NGINX Ingress Controller. Built on Envoy Gateway and the Kubernetes Gateway API, TEG delivers advanced traffic management, security, and observability without vendor lock-in.

  • 100% upstream Envoy Gateway – CVE-protected builds
  • Kubernetes Gateway API native – Modern, portable, and extensible ingress
  • Enterprise-grade support – 24/7 production support from Envoy experts
  • Learn more
    Decorative CTA background pattern background background
    Tetrate logo in the CTA section Tetrate logo in the CTA section for mobile

    Ready to enhance your
    network

    with more
    intelligence?