MCP Catalog Now Available: Simplified Discovery, Configuration, and AI Observability in Tetrate Agent Router Service

Learn more

The Golden Signals of AI Governance

When your CEO asks "are our AI systems compliant right now?" can you answer in less than three business days? If not, you're governing blind. Here are the five metrics that matter.

The Golden Signals of AI Governance

You have policies. You have documentation. You have a governance framework that looks great in the slide deck. But when your CEO asks “are our AI systems actually compliant right now?” can you answer in less than three business days?

If not, you’re governing blind.

Tetrate Agent Router Enterprise provides continuous runtime governance for GenAI systems. Enforce policies, control costs, and maintain compliance at the infrastructure layer — without touching application code.

Learn more

Governance Without Measurement Is Just Hope

Everyone learned this lesson with traditional IT operations. You can’t manage what you don’t measure. That’s why we have monitoring, observability, SRE golden signals (latency, traffic, errors, saturation).

AI governance needs the same rigor. But most organizations are measuring either nothing or everything.

The “nothing” crowd has policies but no instrumentation. They trust that teams are following the rules because teams said they would. This works right up until the audit or the incident.

The “everything” crowd is logging every token, capturing every request, storing everything forever. Their storage costs are astronomical and they still can’t answer simple questions because the data is too messy to query.

What Are the Golden Signals of AI Governance?

In SRE, the “golden signals” are the small set of metrics that tell you whether your system is healthy. Not every possible metric—just the ones that matter most.

For AI governance, I’d propose five:

1. Policy Violation Rate

How often are governance policies being triggered?

This isn’t “are violations happening?” (they always are). It’s “how many requests are violating policies and what’s the trend?”

You want to know:

  • PII detected in prompts: X per hour
  • Blocked topics requested: Y per day
  • Rate limits hit: Z per service
  • Unauthorized model access attempts: N per week

A spike in any of these metrics means something changed: an attack, a misconfiguration, a new feature that doesn’t respect guardrails, or a policy that’s too strict and blocking legitimate use.

2. Model Routing Compliance

Are requests going to the right models according to your data classification policies?

If your policy says “customer PII uses dedicated models only,” you need to track:

  • Percentage of PII-containing requests routed to dedicated models
  • Any PII-containing requests that went to shared APIs (and why)
  • Model provider distribution vs. policy expectations

This is the metric that tells you whether your data handling policies are being enforced or just documented.

3. Latency Distribution

How long are governance checks taking, and are they affecting user experience?

Governance that makes your application too slow is governance that will be bypassed. You need to track:

  • P50, P95, P99 latency for policy evaluation
  • Which policies are slowest
  • Whether governance overhead is within acceptable bounds

If your PII filter adds 500ms to every request, that’s a problem. If it adds 5ms, that’s fine.

4. Token Cost by Service/Team/Model

Who’s spending what on AI, and is it within expected parameters?

This is partly governance (preventing runaway costs from misconfigured agents) and partly accountability (understanding where money is going):

  • Total token consumption per service
  • Cost per request/user/day
  • Distribution across models (cheap vs. expensive)
  • Outliers (single requests consuming excessive tokens)

A sudden spike in token costs might indicate a bug, an attack, or a feature that needs optimization.

5. Audit Log Completeness

Are you capturing enough data to prove compliance?

This is the meta-metric: are your governance measurements themselves working?

  • Percentage of requests with complete audit logs
  • Gap detection (missing logs, failed writes)
  • Time-to-query for compliance questions

If your audit logs have 20% gaps, your other metrics are suspect. If you can’t answer “show me all requests from user X last week” in under 60 seconds, your logs aren’t useful.

What These Signals Tell You

Individually, each metric shows you something specific. Together, they tell you whether your governance is working:

Healthy state:

  • Policy violations are low and stable
  • Model routing matches policy expectations
  • Governance latency is minimal
  • Token costs are predictable
  • Audit logs are complete and queryable

Warning state:

  • Policy violations trending up
  • Compliance percentage dropping
  • Latency increasing
  • Token costs spiking
  • Log gaps appearing

Crisis state:

  • Policy violations surging
  • Unauthorized model access detected
  • Governance causing timeouts
  • Token costs out of control
  • Audit logs incomplete/missing

Where to Capture These Metrics

Application-layer instrumentation is one option. Each service exports metrics about its governance decisions.

Problems:

  • Inconsistent implementation across services
  • Gaps when services forget to instrument something
  • Metrics format varies by team
  • Aggregation is a nightmare

Infrastructure-layer capture is the better option. The gateway that all AI requests flow through sees everything:

  • Every policy evaluation (even the ones that don’t trigger)
  • Every model routing decision
  • Every request’s latency
  • Every token consumed
  • Every audit log written

You get consistent, comprehensive metrics by default. No relying on 15 teams to all instrument correctly.

The Dashboard You Actually Need

Most AI governance dashboards show vanity metrics: “We processed 10M AI requests this month!” Great, but are you compliant?

A useful governance dashboard shows:

  • Real-time policy violation rate with trend
  • Compliance percentage by policy type
  • Model routing distribution vs. policy expectations
  • Governance latency impact on user experience
  • Token cost burn rate and projections
  • Audit log health

And critically: alerts when metrics cross thresholds. You shouldn’t need to check the dashboard daily to notice that PII filtering has stopped working.

The Threshold Question

What’s an acceptable policy violation rate?

Zero is not realistic. You’ll have edge cases, you’ll have users testing boundaries, you’ll have false positives.

But you need to know what “normal” looks like for your organization, and you need to know when deviations are significant.

Example thresholds:

  • PII detected in prompts: 0.5% of requests is normal, 5% means something’s wrong
  • Topic blocking: 1% is expected, 10% means users are hitting unexpected restrictions
  • Model routing violations: 0% tolerance for PII going to wrong models, might allow 0.1% for configuration edge cases

These thresholds vary by organization risk appetite. The point is having them, tracking against them, and alerting when they’re breached.

The Trend Matters More Than the Absolute

A single policy violation is usually not a crisis. A sudden increase in violations is.

If your PII detection rate goes from 0.5% to 2% overnight, something changed:

  • New feature launched that’s generating PII-containing prompts
  • Attack attempt to exfiltrate data
  • Policy configuration changed and is now more/less strict
  • Detection system degraded and is missing violations

The absolute number tells you if there’s an issue. The trend tells you if it’s getting worse.

When Metrics Disagree

What if your policy violation rate is low but your audit log completeness is also low?

That’s a problem. It means you’re not seeing the full picture. You might be compliant, or you might be blind to violations.

Cross-checking metrics is how you avoid false confidence:

  • High compliance + complete logs = probably good
  • High compliance + incomplete logs = you don’t actually know
  • Low compliance + complete logs = you have work to do but at least you know what
  • Low compliance + incomplete logs = you’re in trouble and don’t know the extent

The Measurement Feedback Loop

The point of measuring governance isn’t just to create dashboards. It’s to improve governance.

If a policy is triggering constantly, maybe it’s too strict or poorly configured. If governance latency is unacceptable, maybe you need to optimize or cache. If token costs are out of control, maybe you need smarter model routing. If audit logs are incomplete, maybe your infrastructure isn’t as reliable as you thought.

Measure, analyze, improve, repeat. This is how governance matures from “we have policies” to “we know our systems are compliant.”

The Executive Answer

Back to the original question: “Are our AI systems compliant right now?”

With the right metrics, you can answer: “Yes. In the last 24 hours, we processed 847K AI requests. 99.4% complied with all policies. The 0.6% that triggered violations were blocked appropriately and are logged for review. All requests to external models were properly filtered for PII. Average governance overhead is 8ms per request. Token spending is tracking to budget. Audit logs are 100% complete.”

Or: “We have a problem. PII detection triggered on 2.1% of requests to external models in the last 6 hours, up from our normal 0.5%. We’re investigating whether this is a configuration change or an attack. All requests were blocked per policy and are logged.”

Either way, you know. And knowing is how you govern.


Tetrate’s Agent Router Enterprise provides real-time governance metrics captured at the infrastructure layer, giving you visibility into policy compliance, model routing, latency, costs, and audit log health. Stop guessing whether your governance is working—measure it. Learn more here ›

Product background Product background for tablets
New to service mesh?

Get up to speed with free online courses at Tetrate Academy and quickly learn Istio and Envoy.

Learn more
Using Kubernetes?

Tetrate Enterprise Gateway for Envoy (TEG) is the easiest way to get started with Envoy Gateway for production use cases. Get the power of Envoy Proxy in an easy-to-consume package managed via the Kubernetes Gateway API.

Learn more
Getting started with Istio?

Tetrate Istio Subscription (TIS) is the most reliable path to production, providing a complete solution for running Istio and Envoy securely in mission-critical environments. It includes:

  • Tetrate Istio Distro – A 100% upstream distribution of Istio and Envoy.
  • Compliance-ready – FIPS-verified and FedRAMP-ready for high-security needs.
  • Enterprise-grade support – The ONLY enterprise support for 100% upstream Istio, ensuring no vendor lock-in.
  • Learn more
    Need global visibility for Istio?

    TIS+ is a hosted Day 2 operations solution for Istio designed to streamline workflows for platform and support teams. It offers:

  • A global service dashboard
  • Multi-cluster visibility
  • Service topology visualization
  • Workspace-based access control
  • Learn more
    Decorative CTA background pattern background background
    Tetrate logo in the CTA section Tetrate logo in the CTA section for mobile

    Ready to enhance your
    network

    with more
    intelligence?