Agent Router Enterprise

The Golden Signals of AI Governance

When your CEO asks "are our AI systems compliant right now?" can you answer in less than three business days? If not, you're governing blind. Here are the five metrics that matter.

Paul Merrison

January 8, 2026

You have policies. You have documentation. You have a governance framework that looks great in the slide deck. But when your CEO asks “are our AI systems actually compliant right now?” can you answer in less than three business days?

If not, you’re governing blind.

Governance Without Measurement Is Just Hope

Everyone learned this lesson with traditional IT operations. You can’t manage what you don’t measure. That’s why we have monitoring, observability, SRE golden signals (latency, traffic, errors, saturation).

AI governance needs the same rigor. But most organizations are measuring either nothing or everything.

The “nothing” crowd has policies but no instrumentation. They trust that teams are following the rules because teams said they would. This works right up until the audit or the incident.

The “everything” crowd is logging every token, capturing every request, storing everything forever. Their storage costs are astronomical and they still can’t answer simple questions because the data is too messy to query.

What Are the Golden Signals of AI Governance?

In SRE, the “golden signals” are the small set of metrics that tell you whether your system is healthy. Not every possible metric—just the ones that matter most.

For AI governance, I’d propose five:

1. Policy Violation Rate

How often are governance policies being triggered?

This isn’t “are violations happening?” (they always are). It’s “how many requests are violating policies and what’s the trend?”

You want to know:

PII detected in prompts: X per hour
Blocked topics requested: Y per day
Rate limits hit: Z per service
Unauthorized model access attempts: N per week

A spike in any of these metrics means something changed: an attack, a misconfiguration, a new feature that doesn’t respect guardrails, or a policy that’s too strict and blocking legitimate use.

2. Model Routing Compliance

Are requests going to the right models according to your data classification policies?

If your policy says “customer PII uses dedicated models only,” you need to track:

Percentage of PII-containing requests routed to dedicated models
Any PII-containing requests that went to shared APIs (and why)
Model provider distribution vs. policy expectations

This is the metric that tells you whether your data handling policies are being enforced or just documented.

3. Latency Distribution

How long are governance checks taking, and are they affecting user experience?

Governance that makes your application too slow is governance that will be bypassed. You need to track:

P50, P95, P99 latency for policy evaluation
Which policies are slowest
Whether governance overhead is within acceptable bounds

If your PII filter adds 500ms to every request, that’s a problem. If it adds 5ms, that’s fine.

4. Token Cost by Service/Team/Model

Who’s spending what on AI, and is it within expected parameters?

This is partly governance (preventing runaway costs from misconfigured agents) and partly accountability (understanding where money is going):

Total token consumption per service
Cost per request/user/day
Distribution across models (cheap vs. expensive)
Outliers (single requests consuming excessive tokens)

A sudden spike in token costs might indicate a bug, an attack, or a feature that needs optimization.

5. Audit Log Completeness

Are you capturing enough data to prove compliance?

This is the meta-metric: are your governance measurements themselves working?

Percentage of requests with complete audit logs
Gap detection (missing logs, failed writes)
Time-to-query for compliance questions

If your audit logs have 20% gaps, your other metrics are suspect. If you can’t answer “show me all requests from user X last week” in under 60 seconds, your logs aren’t useful.

What These Signals Tell You

Individually, each metric shows you something specific. Together, they tell you whether your governance is working:

Healthy state:

Policy violations are low and stable
Model routing matches policy expectations
Governance latency is minimal
Token costs are predictable
Audit logs are complete and queryable

Warning state:

Policy violations trending up
Compliance percentage dropping
Latency increasing
Token costs spiking
Log gaps appearing

Crisis state:

Policy violations surging
Unauthorized model access detected
Governance causing timeouts
Token costs out of control
Audit logs incomplete/missing

Where to Capture These Metrics

Application-layer instrumentation is one option. Each service exports metrics about its governance decisions.

Problems:

Inconsistent implementation across services
Gaps when services forget to instrument something
Metrics format varies by team
Aggregation is a nightmare

Infrastructure-layer capture is the better option. The gateway that all AI requests flow through sees everything:

Every policy evaluation (even the ones that don’t trigger)
Every model routing decision
Every request’s latency
Every token consumed
Every audit log written

You get consistent, comprehensive metrics by default. No relying on 15 teams to all instrument correctly.

The Dashboard You Actually Need

Most AI governance dashboards show vanity metrics: “We processed 10M AI requests this month!” Great, but are you compliant?

A useful governance dashboard shows:

Real-time policy violation rate with trend
Compliance percentage by policy type
Model routing distribution vs. policy expectations
Governance latency impact on user experience
Token cost burn rate and projections
Audit log health

And critically: alerts when metrics cross thresholds. You shouldn’t need to check the dashboard daily to notice that PII filtering has stopped working.

The Threshold Question

What’s an acceptable policy violation rate?

Zero is not realistic. You’ll have edge cases, you’ll have users testing boundaries, you’ll have false positives.

But you need to know what “normal” looks like for your organization, and you need to know when deviations are significant.

Example thresholds:

PII detected in prompts: 0.5% of requests is normal, 5% means something’s wrong
Topic blocking: 1% is expected, 10% means users are hitting unexpected restrictions
Model routing violations: 0% tolerance for PII going to wrong models, might allow 0.1% for configuration edge cases

These thresholds vary by organization risk appetite. The point is having them, tracking against them, and alerting when they’re breached.

The Trend Matters More Than the Absolute

A single policy violation is usually not a crisis. A sudden increase in violations is.

If your PII detection rate goes from 0.5% to 2% overnight, something changed:

New feature launched that’s generating PII-containing prompts
Attack attempt to exfiltrate data
Policy configuration changed and is now more/less strict
Detection system degraded and is missing violations

The absolute number tells you if there’s an issue. The trend tells you if it’s getting worse.

When Metrics Disagree

What if your policy violation rate is low but your audit log completeness is also low?

That’s a problem. It means you’re not seeing the full picture. You might be compliant, or you might be blind to violations.

Cross-checking metrics is how you avoid false confidence:

High compliance + complete logs = probably good
High compliance + incomplete logs = you don’t actually know
Low compliance + complete logs = you have work to do but at least you know what
Low compliance + incomplete logs = you’re in trouble and don’t know the extent

The Measurement Feedback Loop

The point of measuring governance isn’t just to create dashboards. It’s to improve governance.

If a policy is triggering constantly, maybe it’s too strict or poorly configured. If governance latency is unacceptable, maybe you need to optimize or cache. If token costs are out of control, maybe you need smarter model routing. If audit logs are incomplete, maybe your infrastructure isn’t as reliable as you thought.

Measure, analyze, improve, repeat. This is how governance matures from “we have policies” to “we know our systems are compliant.”

The Executive Answer

Back to the original question: “Are our AI systems compliant right now?”

With the right metrics, you can answer: “Yes. In the last 24 hours, we processed 847K AI requests. 99.4% complied with all policies. The 0.6% that triggered violations were blocked appropriately and are logged for review. All requests to external models were properly filtered for PII. Average governance overhead is 8ms per request. Token spending is tracking to budget. Audit logs are 100% complete.”

Or: “We have a problem. PII detection triggered on 2.1% of requests to external models in the last 6 hours, up from our normal 0.5%. We’re investigating whether this is a configuration change or an attack. All requests were blocked per policy and are logged.”

Either way, you know. And knowing is how you govern.

Tetrate’s Agent Router Enterprise provides real-time governance metrics captured at the infrastructure layer, giving you visibility into policy compliance, model routing, latency, costs, and audit log health. Stop guessing whether your governance is working—measure it. Learn more here ›

Paul Merrison

January 8, 2026

Building AI agents

Agent Router Enterprise provides managed LLM & MCP Gateways plus AI Guardrails in your dedicated instance. Graduate agents from prototype to production with consistent model access, governed tool use, and runtime supervision — built on Envoy AI Gateway by its creators.

LLM Gateway – Unified model catalog with automatic fallback across providers

MCP Gateway – Curated tool access with per-profile authentication and filtering

AI Guardrails – Enforce policies, prevent data loss, and supervise agent behavior

Learn more

Replacing NGINX Ingress

Tetrate Enterprise Gateway for Envoy (TEG) is the enterprise-ready replacement for NGINX Ingress Controller. Built on Envoy Gateway and the Kubernetes Gateway API, TEG delivers advanced traffic management, security, and observability without vendor lock-in.

100% upstream Envoy Gateway – CVE-protected builds

Kubernetes Gateway API native – Modern, portable, and extensible ingress

Enterprise-grade support – 24/7 production support from Envoy experts

Learn more

Announcing Built On Envoy: Making Envoy Extensions Accessible to Everyone

The Golden Signals of AI Governance

Governance Without Measurement Is Just Hope