Agent Router Enterprise

The Tier 3 Problem: Why Banks Can't Use LLMs for Real Decisions

Banks are leaving $920bn in operational efficiency gains on the table because they can't get LLMs past their risk teams. The issue isn't caution—it's that current LLMs can't satisfy SR 11-7 requirements.

Paul Merrison

January 13, 2026

The Tier 3 Problem: Why Banks Can't Use LLMs for Real Decisions

Morgan Stanley estimates that generative AI could unlock around $920 billion in annual operational efficiency gains for financial services. Banks are capturing approximately none of that.

Not because they’re not trying. Every major bank has an AI strategy, a Center of Excellence, probably a Chief AI Officer by now. They’ve got pilots. They’ve got proofs of concept. What they don’t have is LLMs making real decisions in production.

The Tier 3 Ghetto

If you’ve spent time in a bank’s AI governance process, you’ve encountered the tiering system. It goes something like this:

Tier 3: Low-risk, generative use cases. Chatbots for internal IT help desks. Document summarisation. Code assistance for developers. Meeting note generators. Nice to have, low stakes.

Tier 2: Decision-support use cases. Systems that help humans make decisions but don’t make decisions themselves. An analyst reviews the output before anything happens.

Tier 1: Automated decision-making. The system makes or directly drives decisions without human review. Credit adjudication. Risk classification. Compliance determinations.

The economic value lives in Tier 1 and Tier 2. That’s where you replace manual processes, reduce cycle times, and scale decisions that currently require expensive human judgment.

Most bank LLM deployments are stuck in Tier 3.

The chatbots are fine. The summarisation tools are genuinely useful. But these are productivity enhancements, not transformation. The gap between “our developers have GitHub Copilot” and “our credit decisions are automated” is measured in billions of dollars—and banks can’t close it.

What Banks Actually Want to Do

The use cases sitting in the “we’d love to, but we can’t” pile are substantial:

Automated credit decisioning. Not “assist the analyst”—actual yes/no adjudication. Structured credit underwriting with deterministic extraction, scoring, and narrative justification. SME and commercial credit assessment based on document analysis.

KYC/AML/Fraud classification. Stable risk signals. Repeatable decisions that produce the same answer when you run them twice. Systems that can be audited because they behave consistently.

Document intelligence in regulated workflows. Extraction that feeds downstream scoring models. Income verification, employment confirmation, risk-relevant feature extraction from unstructured documents. Not as an internal convenience tool—as part of the actual decision pipeline.

Compliance automation. Consistent interpretation of regulatory rules. Deterministic classification of transactions, communications, or client activities.

Supervisory reporting. Reliable data extraction from documents that feeds regulatory reporting pipelines. The kind of thing you really don’t want to get wrong.

Each of these represents meaningful operational leverage. Each is squarely within what LLMs can technically do. And each is effectively off-limits under current governance frameworks.

The SR 11-7 Wall

The obstacle isn’t that banks are excessively cautious (though they are, and as a customer I appreciate it). The obstacle is regulatory.

In the US, the governing framework is SR 11-7, a Federal Reserve guidance letter that emerged from the wreckage of 2008. It turns out that when half your banks are running critical risk calculations in Excel spreadsheets that get emailed around weekly, you might want some standards.

SR 11-7 applies to “any method, system, or approach that processes input data to produce estimates, scores, classifications, or decisions.” That’s broad. Critically, it covers non-numeric outputs used to make or support decisions. Your LLM doesn’t have to output a number to fall in scope—it just has to influence something that matters.

The regulation itself is sensible. It requires things like:

A clear definition of the model and how it works
Comprehensive validation including independent review
Understanding of the training data and its limitations
Stable, reproducible behaviour that can be tested
Documentation sufficient for auditors and examiners

These are reasonable requirements for systems that make consequential decisions. The problem is that current LLMs can’t satisfy them.

The Gap

Here’s where banks are stuck:

You can’t validate what you can’t reproduce. LLMs are non-deterministic by design. Run the same prompt twice, get different outputs. Traditional model validation assumes you can measure accuracy, track drift, and regression test changes. Non-determinism breaks all of that.

You can’t document what you can’t see. Proprietary foundation models don’t disclose their training data. “Trust me, the data is representative” doesn’t satisfy examiners who want curated data lineage and documented limitations.

You can’t control what vendors change without telling you. Foundation model providers update models silently. Your January validation might be irrelevant by March. The thing you tested isn’t the thing running in production.

You can’t explain what you don’t understand. SR 11-7 implicitly requires explainability—a defensible theory of why the model behaves as it does. LLMs can’t provide causal explanations for their outputs.

Banks aren’t being difficult. They’re looking at the regulatory requirements, looking at the capabilities of current LLMs, and correctly concluding that the gap is unbridgeable. So they deploy chatbots and wait.

There’s a Way Through

This is the first in a series of posts exploring how that gap might close.

The short version: recent research has demonstrated deterministic LLM inference under controlled conditions. Determinism doesn’t solve every SR 11-7 challenge, but it solves the reproducibility problem—which unlocks validation, monitoring, and change management.

Combine determinism with transparency (open-weight models with documented training data), and suddenly LLMs start looking like systems that can sit inside traditional Model Risk Management frameworks.

In the next post, we’ll dig into why non-determinism specifically breaks MRM validation. After that, we’ll cover the transparency problem. And finally, we’ll look at the specific use cases that become tractable when you have both determinism and transparency.

The Tier 3 ghetto isn’t permanent. But escaping it requires solving real technical and governance problems—not just waiting for regulators to get comfortable.

Agent Router Enterprise helps teams graduate AI agents from prototype to production with centralized LLM routing, AI Guardrails for consistent policy enforcement, and continuous supervision through behavioral metrics. When you’re ready to move beyond Tier 3, the infrastructure matters. Learn more here ›

Paul Merrison

January 13, 2026

Building AI agents

Agent Router Enterprise provides a managed AI Gateway, MCP Gateway, and AI Guardrails in your dedicated instance. Graduate agents from prototype to production with consistent model access, governed tool use, and runtime supervision — built on Envoy AI Gateway by its creators.

AI Gateway – Unified model catalog with automatic fallback across providers

MCP Gateway – Curated tool access with per-profile authentication and filtering

AI Guardrails – Enforce policies, prevent data loss, and supervise agent behavior

Learn more

Replacing NGINX Ingress

Tetrate Enterprise Gateway for Envoy (TEG) is the enterprise-ready replacement for NGINX Ingress Controller. Built on Envoy Gateway and the Kubernetes Gateway API, TEG delivers advanced traffic management, security, and observability without vendor lock-in.

100% upstream Envoy Gateway – CVE-protected builds

Kubernetes Gateway API native – Modern, portable, and extensible ingress

Enterprise-grade support – 24/7 production support from Envoy experts

Learn more

Announcing Envoy AI Gateway 1.0: A Stable Foundation for Enterprise AI Traffic

The Tier 3 Problem: Why Banks Can't Use LLMs for Real Decisions

The Tier 3 Ghetto

What Banks Actually Want to Do

The SR 11-7 Wall

The Gap

There’s a Way Through

Building AI agents

Replacing NGINX Ingress

Ready to enhance your
network
with more
intelligence?

Announcing Envoy AI Gateway 1.0: A Stable Foundation for Enterprise AI Traffic

The Tier 3 Problem: Why Banks Can't Use LLMs for Real Decisions

The Tier 3 Ghetto

What Banks Actually Want to Do

The SR 11-7 Wall

The Gap

There’s a Way Through

Building AI agents

Replacing NGINX Ingress

Ready to enhance your network with more intelligence?

Ready to enhance your
network
with more
intelligence?