Announcing Built On Envoy: Making Envoy Extensions Accessible to Everyone

Learn more

Tokens Are the New Unit of Spend. Can You Prove the Return?

Tokens are the fastest-growing line in enterprise tech budgets. The Linux Foundation named the problem; measuring token ROI is still on you. Here's how.

Tokens Are the New Unit of Spend. Can You Prove the Return?

Tokens — the atomic unit of AI, the thing a model produces and a provider bills you for — are now the fastest-growing line on your technology budget, and they behave like nothing else on it. Sometime soon — a budget review, a board deck, a CFO who has started reading about AI spend — someone is going to ask you to prove that line produced a return. Most teams can’t answer that cleanly yet.

The industry just said the same thing out loud. On June 3, the Linux Foundation announced its intent to launch the Tokenomics Foundation — a neutral home for open standards, benchmarks, and best practices for the economics of AI infrastructure, built in partnership with the FinOps Foundation. The list of organizations backing it reads like a budget meeting that got out of hand: Accenture, Booking.com, Google Cloud, IBM, JPMorganChase, Microsoft, Oracle, Salesforce, SAP, ServiceNow. Tokenomics is the practice behind the name: managing how tokens are produced, consumed, and turned into business value.

When that many enterprises agree a problem needs a standards body, the problem is real and it is expensive. J.R. Storment of the FinOps Foundation put it in one line:

Token costs and efficiency have become a CEO-level concern, not an engineering footnote. But naming the problem isn’t solving it.

J.R. Storment FinOps Foundation

That second sentence is the one worth sitting with. The Foundation gives the industry a shared vocabulary and, eventually, shared benchmarks. It does not give you a number for whether your AI spend produced anything last week. That number you still have to capture yourself — and the place most teams try to capture it, the monthly invoice, is the one place it isn’t.

Tetrate Agent Router Enterprise provides continuous runtime governance for GenAI systems. Enforce policies, control costs, and maintain compliance at the infrastructure layer — without touching application code.

Learn more

Key takeaways

  • Tokens are now the fastest-growing line item on enterprise technology budgets. Goldman Sachs projects global token usage will multiply 24x between 2026 and 2030. Per-token prices have stopped falling and new-model prices are rising.
  • Not all tokens cost the same. Input, output, cached, and reasoning tokens differ in price by orders of magnitude — so a token count is not a cost. The correct cost analysis slices spend by token type, model, and workload, and that’s the work that makes a token ROI number trustworthy.
  • The Linux Foundation’s Tokenomics Foundation is the industry naming the problem: a neutral home for standards on token production, consumption (FinOps for AI), and value.
  • Standards tell you whether you paid a fair price. They don’t tell you whether the spend produced anything. Both questions need answering, and they need different instruments.
  • Token ROI is hard to measure for a structural reason: tokens are abstract, attribution decays, and the invoice arrives too late and too coarse to act on.
  • The routing layer is the one point in the stack that sees who, which agent, which workload, and how many tokens — at request time. That’s where measurement and capture belong.

Why tokens broke the FinOps playbook

FinOps — the discipline of managing variable technology spend — spent a decade maturing on cloud. Tag your resources, allocate the bill, right-size your instances, repeat. It worked because cloud resources have stable identifiers that map back to teams and projects.

Tokens don’t behave like instances, and the Foundation’s own backers are blunt about why. Here’s Salesforce’s Nishant Gupta in the announcement:

Token economics is fundamentally more abstract and more opaque than anything we’ve managed at this scale before. Input versus output tokens, cached versus non-cached, pricing structures that don’t behave like compute or storage.

Nishant Gupta Salesforce

The data backs him up. Flexera’s 2026 State of the Cloud research found cloud waste rising for the first time in five years, with AI workloads a named cause. Most teams, it noted, “still lack the benchmarks to know whether they are paying a fair price for the value they receive.”

The economics also stopped cooperating. Through 2023–2025, per-token prices fell so reliably that you could overspend and let the market bail you out next quarter. That tailwind is gone. Prices have leveled off, new model prices are rising, and usage is climbing 24x by 2030. When the unit price stops falling and consumption goes vertical, the discipline to govern that spend stops being optional.

So the FinOps muscle is the right muscle. It just doesn’t stretch to cover an abstract, opaque, ad-hoc unit of spend without new instrumentation underneath it. That’s the gap the Tokenomics Foundation is forming to standardize. It’s also the gap a routing layer was already built to close.

What token ROI actually measures: two questions, not one

“What’s our ROI on tokens?” is really two questions wearing one coat. They have different answers and they need different instruments.

Did we pay a fair price? This is the efficiency question — token cost and efficiency benchmarked across models and vendors. It’s exactly what the Tokenomics Foundation exists to standardize, and where an industry benchmark genuinely helps: without a neutral reference, “is $0.40 per million tokens good?” has no answer.

Did the spend produce anything? This is the effectiveness question, and no standards body can answer it for you, because the answer lives in your business. Whether a coding agent’s tokens turned into merged PRs, whether a support agent’s tokens closed tickets, whether a research agent’s tokens produced a decision someone acted on — that mapping is specific to your workloads and your outcomes.

Most teams can’t answer either cleanly today. Not because they lack discipline, but because the data needed to answer them is structurally missing at the point where it would be useful.

Why token ROI is hard to measure: three honest gaps

Capturing token ROI is hard, and it’s worth being precise about why, because the failure modes are consistent across every team we talk to. In short: tokens are abstract, attribution decays, and the invoice arrives too late to act on. Here’s each one.

Tokens are abstract, so the cost signal is hard to read. Input versus output, cached versus uncached, reasoning tokens versus completion tokens — these carry wildly different costs and the invoice blurs them into one figure. Across about thirty engineers running agents through Tetrate Agent Router last month, we measured a roughly 20x spread in tokens-per-dollar between our most efficient workload and our least. Our heaviest CI job got around 3.5 million tokens per dollar on a cached, well-matched model; a reasoning-heavy agent got around 200,000. Same gateway, same week. Some of that gap is genuine — reasoning doesn’t cache. Some of it is “we picked the expensive model six months ago and never went back.” You cannot tell which from the bill. That’s the whole point: when a token isn’t a fixed unit of cost, counting tokens tells you nothing about spend — only cost analysis that separates the expensive tokens from the cheap ones does, and that is the work a real token ROI answer is built on.

Attribution decays the moment you defer it. The instinct is to reconstruct ROI at month-end from the invoice. It doesn’t work, because an API key named code-agent-key could belong to anyone. By the time the bill arrives, the engineer who created it is on a different team, the agent’s been renamed twice, and the workload moved repos. Reconstructing who-spent-what-on-which-workload after the fact is archaeology. Last month, about half our own token spend was on keys with no team, no workload, and no owner beyond an email address — not a discipline failure, a product failure. The tag column was right there. Nobody fills in metadata whose value accrues to finance six months later.

The invoice is the wrong instrument. It arrives too late to change a decision and too coarse to locate one. It’s a record of what happened, not a control surface. You can’t right-size a model selection, catch a runaway agent, or settle a chargeback dispute with a number that shows up thirty days after the tokens were spent and aggregates four teams into one line.

None of these are solved by trying harder. They’re solved by capturing the right data at the right point in the stack.

The point in the stack that actually sees the tokens

There’s exactly one place that has the full picture at the moment the tokens are spent: the routing layer between your agents and your model providers. It sees who issued the credential, who’s using it, which agent it belongs to, which workload it serves, and how many input and output tokens — cached or not — each request consumed. The invoice sees a total. The provider sees an anonymous key. The router sees all of it, at request time, before the data has had a chance to decay.

That’s the architectural bet behind Tetrate Agent Router Enterprise, and it maps directly onto the two ROI questions:

For the fair-price question, routing many agents through a single governed boundary makes tokens-per-dollar by agent a first-class metric instead of a forensic exercise. You can see which workloads are well-matched to their model and which are quietly running on something far more expensive than they need. We had an alert-triage agent on a top-tier reasoning model by default. Against a real quality bar — does it catch every severity-1? — we moved it to a model roughly 15x cheaper with no measurable drop. The audit took an afternoon. It’s the AI equivalent of right-sizing an instance, and as the Tokenomics Foundation publishes standard benchmarks, the comparison gets sharper, not redundant.

For the did-it-produce-anything question, the router captures attribution as a property of key issuance, not of usage. Every key inherits an owner, a team, an agent, and a workload at the moment it’s created, and those labels ride every request to the cost ledger automatically. No tagging review. No untagged state. That’s what makes effectiveness measurable at all: you can finally put spend-per-workload next to output-per-workload, look at the variance across engineers doing similar work, and ask the question the bill could never surface — is this $800 producing five PRs, or two PRs and a lot of “let me try that again”?

Built on the battle-tested Envoy AI Gateway, the routing layer becomes the same product as the cost dashboard, because it’s the only point in the stack with the full picture when it matters.

Standards and instrumentation are the same project

It would be tempting to read the Tokenomics Foundation news as “wait for the standards.” That’s the wrong read. The Foundation is standardizing the vocabulary and the benchmarks — token factory effectiveness, FinOps for AI, AI value — so the industry can compare notes honestly. Salesforce framed the contribution model well: the muscle “should evolve through broad experimentation across the industry, with the best ideas and practices contributed back.”

Standards make the numbers comparable. Instrumentation makes the numbers exist. You need both, and you need the instrumentation in place before the benchmarks land, because the team that already captures token spend by agent and by workload is the team that can actually use a benchmark the day it ships. The team still reconstructing attribution from invoices will be standardizing a number they can’t yet measure.

You don’t have to wait for the benchmarks to start. The token-ROI audit that pays off this week:

  1. Rank your top keys by spend. A handful almost always account for half the bill. That’s where the attention pays off.
  2. Put tokens-per-dollar next to each one. Anything well below the band you’d expect for that kind of work is either reasoning that can’t cache (fine) or a model picked once and never revisited (fixable).
  3. Re-test the worst offender against a real quality bar. Not “good output” — something specific, like “catches every severity-1.” If a cheaper model clears the bar, move the workload.
  4. Check that every key carries an owner, team, agent, and workload. The keys that don’t are the spend you won’t be able to explain in the budget review. Fix issuance so the next key can’t skip them.

The first three answer the fair-price question. The fourth is what makes the did-it-produce-anything question answerable at all.

The industry just agreed tokens are the unit of spend and the unit of value. The work now is making sure that, for your own organization, those two are something you can put side by side — and act on before the bill arrives.


Tetrate believes the routing layer is where token economics gets measured and governed, because it’s the only point in the stack that sees who, which agent, which workload, and how many tokens — at request time. Tetrate Agent Router Enterprise routes traffic from many agents through a single governed boundary, captures attribution at key issuance, and surfaces the efficiency and variance signals that turn token spend into a question you can answer. Built on the battle-tested Envoy AI Gateway. If you’re trying to prove the return on your own AI spend, talk with our team about token governance.

Frequently asked questions

What is the Tokenomics Foundation?

The Tokenomics Foundation is a Linux Foundation program, announced in June 2026, focused on establishing open standards, benchmarks, and best practices for the economics of AI infrastructure. It operates in partnership with the FinOps Foundation and covers token production (token factory effectiveness), consumption (FinOps for AI), and monetization (AI value). Early supporters include Accenture, Booking.com, Google Cloud, IBM, JPMorganChase, Microsoft, Oracle, Salesforce, SAP, and ServiceNow. Its goal is a neutral, community-owned home for the standards that let buyers and suppliers measure token economics transparently.

What is tokenomics, and how is it different from FinOps?

Tokenomics is the practice of managing the production, consumption, and monetization of token-based AI to generate business outcomes. FinOps is the broader discipline of managing variable technology spend, originally built for cloud. Tokenomics extends FinOps into the era of token-based AI — but tokens behave differently from cloud resources (input versus output, cached versus uncached, opaque pricing), so they require new instrumentation underneath the familiar FinOps practices.

Why is token ROI so hard to measure?

Three structural reasons. First, tokens are abstract — input, output, cached, and reasoning tokens carry very different costs that the invoice blurs into one number. Second, attribution decays — by the time the bill arrives, the keys, agents, and workloads have changed, making after-the-fact allocation unreliable. Third, the invoice is the wrong instrument — it arrives too late to change a decision and is too coarse to locate one. The fix is capturing cost and attribution data at request time, at the routing layer, rather than reconstructing it from the bill.

How does Tetrate Agent Router Enterprise help with token economics?

It routes traffic from many agents through a single governed boundary, which is the one point in the stack that sees who issued a key, who’s using it, which agent and workload it serves, and how many tokens each request consumed. It captures attribution as a property of key issuance — every key inherits an owner, team, agent, and workload — so spend is sliceable by any of those dimensions without manual tagging. That makes both ROI questions answerable: tokens-per-dollar for efficiency, and spend-per-workload against output for effectiveness.

Should we wait for the Tokenomics Foundation’s standards before acting?

No. The Foundation standardizes the vocabulary and benchmarks so the industry can compare token economics honestly — valuable work, but separate from instrumenting your own spend. Standards make numbers comparable; instrumentation makes the numbers exist. Teams that already capture token spend by agent and workload will be able to use industry benchmarks the day they ship. Teams still reconstructing attribution from invoices won’t yet have the numbers to standardize.

Product background Product background for tablets
Building AI agents

Agent Router Enterprise provides managed LLM & MCP Gateways plus AI Guardrails in your dedicated instance. Graduate agents from prototype to production with consistent model access, governed tool use, and runtime supervision — built on Envoy AI Gateway by its creators.

  • LLM Gateway – Unified model catalog with automatic fallback across providers
  • MCP Gateway – Curated tool access with per-profile authentication and filtering
  • AI Guardrails – Enforce policies, prevent data loss, and supervise agent behavior
  • Learn more
    Replacing NGINX Ingress

    Tetrate Enterprise Gateway for Envoy (TEG) is the enterprise-ready replacement for NGINX Ingress Controller. Built on Envoy Gateway and the Kubernetes Gateway API, TEG delivers advanced traffic management, security, and observability without vendor lock-in.

  • 100% upstream Envoy Gateway – CVE-protected builds
  • Kubernetes Gateway API native – Modern, portable, and extensible ingress
  • Enterprise-grade support – 24/7 production support from Envoy experts
  • Learn more
    Decorative CTA background pattern background background
    Tetrate logo in the CTA section Tetrate logo in the CTA section for mobile

    Ready to enhance your
    network

    with more
    intelligence?