Introducing Agent Router Enterprise: Managed LLM & MCP Gateways + AI Guardrails in Your Dedicated Instance

Learn more

Black Box Models in a Regulated World

Trust me, the training data is fine is not a valid response to a regulator. Proprietary LLMs fail on basic transparency requirements that traditional models satisfy easily.

Black Box Models in a Regulated World

“Trust me, the training data is fine” is not a valid response to a regulator.

Yet that’s essentially what you get when you ask a foundation model provider about their training data. High-level summaries. Vague assurances. A link to a blog post about their commitment to responsible AI.

This is a problem for banks.

Tetrate offers an enterprise-ready, 100% upstream distribution of Istio, Tetrate Istio Subscription (TIS). TIS is the easiest way to get started with Istio for production use cases. TIS+, a hosted Day 2 operations solution for Istio, adds a global service registry, unified Istio metrics dashboard, and self-service troubleshooting.

Learn more

What SR 11-7 Wants to Know

The US model risk framework isn’t unreasonable in its documentation requirements. It asks for things you’d want to know about any system making consequential decisions:

Clear model definition and theory of operation. What does the model do? What’s the conceptual basis for why it should work? You don’t need to document every line of code, but you need a defensible explanation of the approach.

Understanding of training data. What data was the model trained on? Is it representative of your use case? What biases might it carry? What are its limitations?

Identification of assumptions and intended use. When does the model apply? When should it not be trusted? What are the boundaries of its reliable operation?

Ability to explain generalization. Why should the model work on new inputs? What’s the basis for believing it will behave sensibly in production?

Traditional ML models satisfy these requirements. You know your training data because you curated it. You can document the features, the algorithm, the hyperparameters. You can explain why the model generalizes—it learned patterns from historical data that should persist.

Proprietary LLMs can’t satisfy these requirements because the model vendors won’t tell you what you need to know.

The Proprietary Model Problem

Ask OpenAI, Anthropic, or Google what’s in their training data. You’ll get something like “a diverse corpus of internet text, books, and other sources, filtered for quality.” That’s not documentation—that’s marketing.

They won’t tell you:

  • What’s actually in the training data. Not the categories—the specific sources, the filtering criteria, the representation of different domains. Was financial services documentation included? How much? From what era? With what biases?

  • How the model was fine-tuned. RLHF involves human feedback, but whose feedback? What were they optimizing for? What behaviors did this reinforce or suppress?

  • Why the model behaves the way it does on your domain. Does it know about UK mortgage regulations? Basel III requirements? Your specific document formats? You’re flying blind on domain applicability.

  • When they’re going to update it. Model versions change. Training data shifts. Fine-tuning evolves. You’ll find out when your system starts behaving differently, not before.

This isn’t paranoia—it’s a regulatory requirement. SR 11-7 expects you to understand your model’s basis and limitations. “The vendor assures us it’s fine” doesn’t satisfy that requirement.

The Silent Update Problem

Even if you could somehow document a proprietary model at a point in time, that documentation expires without warning.

Foundation model providers update their models continuously. Sometimes it’s a major version bump. Sometimes it’s a quiet update to fix issues or improve performance. Sometimes the model you’re calling today is literally different from the one you called yesterday.

OpenAI has changed model behavior behind stable API endpoints. Anthropic improves Claude iteratively. Google updates Gemini based on ongoing feedback. None of them guarantee version stability for extended periods.

This breaks change management entirely. Your January validation documented a model that might not exist by March. The thing you validated is not the thing running in production. Your before/after comparisons are meaningless because you don’t know when “before” silently became something else.

Examiners ask: “When did the model change? What was the impact?” You don’t know. You can’t know. The vendor didn’t tell you.

The Explainability Gap

SR 11-7 requires conceptual soundness—a defensible explanation of why the model should work. This implicitly requires some form of explainability.

For traditional models, explainability is tractable. You can trace a credit decision to the input features that drove it. You can explain why the model weights certain factors. You can identify the training examples most similar to a new case.

LLMs can’t provide this kind of explanation. The relationship between input and output flows through billions of parameters in ways that don’t decompose into human-interpretable logic. The model “decided” to output a particular response through a process that even the model creators don’t fully understand.

“The model said yes because…” isn’t a question LLMs can answer mechanistically.

This isn’t unique to LLMs—neural networks in general resist interpretation. But it’s worse for LLMs because the inputs and outputs are natural language. The surface-level “explanation” (the text the model generates) can be confident, coherent, and completely disconnected from the actual computational process that produced it.

When your model can confidently explain its reasoning in grammatically perfect sentences that have no causal relationship to its actual behavior, you have an explanation problem.

The Open-Weight Alternative

There is another path. Open-weight models with disclosed training data fundamentally change the documentation picture.

Models like Llama, Mistral, or Qwen publish their weights and training methodology. You can inspect them. You can document them. You can verify claims about their construction.

More importantly, you can control them. You can fine-tune on your own data with known provenance. You can version lock and prevent silent updates. You can run them in your own infrastructure without depending on external API stability.

This doesn’t solve every transparency problem. You still can’t provide causal, mechanistic explanations for individual outputs. Neural networks remain neural networks.

But you can document:

  • What training data was used (and by extension, what wasn’t)
  • What fine-tuning you applied with what objectives
  • When the model version changed and why
  • What the model’s domain coverage should theoretically include

That’s defensible documentation. It’s not perfect transparency, but it’s the kind of transparency that lets you write an honest model documentation package and defend it to examiners.

The Trade-Off Calculation

Proprietary models often outperform open-weight alternatives on benchmark tasks. They have more resources behind them, more compute for training, more sophisticated fine-tuning.

But benchmarks don’t capture regulatory defensibility. A model that scores 2% better on your internal eval but can’t be documented is worse than a model that can be documented.

Banks are starting to make this calculation explicitly. The question isn’t “which model is best?” It’s “which model can we actually deploy in a regulated context?”

For Tier 3 use cases—internal productivity, developer tools, low-stakes summarization—proprietary models are fine. The documentation requirements are minimal because the risk exposure is minimal.

For Tier 1 and 2 use cases—automated decisions, regulated classifications, anything that feeds consequential workflows—the transparency bar is higher. Open-weight models with known training data start to look like the only viable path.

Where This Leaves Us

The previous post covered why non-determinism breaks validation. This post covers why opacity breaks documentation.

Together, they explain why banks are stuck. Even if you had deterministic LLM outputs (and you mostly don’t), you’d still face a transparency wall with proprietary models. Even if you had full model transparency (which open-weight models can provide), you’d still struggle with validation without determinism.

The final post in this series covers what becomes possible when you have both: deterministic inference and transparent models. That combination doesn’t exist widely yet, but it’s emerging. And when it arrives, the Tier 1 use cases that banks have been waiting for become tractable.


Agent Router Enterprise’s LLM Gateway provides a centralized Model Catalog, giving you control over which models your agents can access—including open-weight models you host yourself. When vendor transparency isn’t enough, infrastructure-layer routing lets you enforce model policies consistently across your agent portfolio. Learn more here ›

Product background Product background for tablets
New to service mesh?

Get up to speed with free online courses at Tetrate Academy and quickly learn Istio and Envoy.

Learn more
Using Kubernetes?

Tetrate Enterprise Gateway for Envoy (TEG) is the easiest way to get started with Envoy Gateway for production use cases. Get the power of Envoy Proxy in an easy-to-consume package managed via the Kubernetes Gateway API.

Learn more
Getting started with Istio?

Tetrate Istio Subscription (TIS) is the most reliable path to production, providing a complete solution for running Istio and Envoy securely in mission-critical environments. It includes:

  • Tetrate Istio Distro – A 100% upstream distribution of Istio and Envoy.
  • Compliance-ready – FIPS-verified and FedRAMP-ready for high-security needs.
  • Enterprise-grade support – The ONLY enterprise support for 100% upstream Istio, ensuring no vendor lock-in.
  • Learn more
    Need global visibility for Istio?

    TIS+ is a hosted Day 2 operations solution for Istio designed to streamline workflows for platform and support teams. It offers:

  • A global service dashboard
  • Multi-cluster visibility
  • Service topology visualization
  • Workspace-based access control
  • Learn more
    Decorative CTA background pattern background background
    Tetrate logo in the CTA section Tetrate logo in the CTA section for mobile

    Ready to enhance your
    network

    with more
    intelligence?