Agent Router Enterprise

Agent Security: What NIST Wants You to Think About Before Your Agent Calls a Tool

Your agent has AWS credentials. It can execute cloud CLI commands. NIST has opinions about this. Here's what tool-calling security looks like in practice.

Paul Merrison

April 21, 2026

Agent Security: What NIST Wants You to Think About Before Your Agent Calls a Tool

The moment your AI agent gains the ability to call tools, it stops being a fancy text generator and becomes something with agency in the real world. This is the entire point of building agents, and also the part that should make your security team nervous.

NIST’s emerging guidance on AI system security addresses this directly. Their framework for securing AI agents, particularly around tool and function calling, maps surprisingly well to practical engineering decisions. The gap isn’t in understanding what needs to be secured. It’s in understanding that most of the security work happens in infrastructure and access control, not in prompting the LLM to be careful.

We’ve spent the better part of a year building agents that interact with production cloud infrastructure (AWS, GCP, Azure), compliance platforms, and vulnerability databases. What follows is what we learned about tool-calling security, framed against the areas NIST wants you to think about.

The Fundamental Tension: Agents Need Access to Be Useful

An agent that can’t access anything is safe but pointless. An agent with broad access is useful but dangerous. The entire security challenge for agent systems lives in this tension.

Our cost optimization agent needs to read AWS billing data, enumerate EC2 instances, check CloudWatch metrics, list RDS databases, inspect NAT gateways, and query GCP billing APIs. That’s a lot of access. And our drill-down chatbot goes further: it can execute cloud CLI commands against live infrastructure to investigate specific resources.

NIST’s guidance says you should validate and sandbox tool calls, prevent privilege escalation, monitor for anomalous behavior, and implement human-in-the-loop controls for high-risk actions. That’s correct but abstract. What does it look like in practice?

Principle 1: Read-Only by Architecture, Not by Prompting

The most common approach to making agents safe is to include instructions in the system prompt: “Do not modify any resources. Only perform read operations.” This is security by asking nicely, and it provides approximately zero protection against prompt injection, model confusion, or bugs.

Our approach is structural. The AWS IAM policies for the cost agent grant read-only permissions, period. The permissions list is explicit: organizations:List*, ce:GetCostAndUsage, ec2:Describe*, rds:Describe*, s3:GetBucket*, and similar read-only API calls. There is no ec2:TerminateInstances in the policy, so the agent cannot terminate instances regardless of what the LLM decides to do or what any adversarial input instructs.

This isn’t a novel insight, but it’s one that agent developers frequently skip. The promise of “the LLM will figure out what to do” extends, in practice, to a dangerous assumption that the LLM will also figure out what not to do. IAM policies, network rules, and architectural constraints are the actual security boundary. The LLM prompt is defense-in-depth, not the primary control.

Principle 2: Credential Isolation Through Role Chaining

Our AWS agent doesn’t have a single set of powerful credentials. Instead, it uses a role-chaining pattern with three levels:

An IAM user holds long-lived credentials (stored in a secrets manager, never in code). That user can do exactly one thing: assume an organization-level role. The organization role can assume member roles in individual AWS accounts. Each member role has the read-only permissions described above.

This means compromising the agent’s base credentials doesn’t give you access to any individual account directly. You’d need to chain through the organization role first, and that role is protected by an external ID (a UUID generated at deployment time, also in the secrets manager) that prevents confused-deputy attacks.

The practical benefit is that you can revoke access to any individual account by modifying its trust policy, without affecting the agent’s access to other accounts. You can audit exactly which accounts the agent accessed by checking CloudTrail for role assumption events. And the temporary credentials generated through role chaining expire after one hour, so a stolen session token has a bounded blast radius.

NIST calls this “preventing privilege escalation.” In practice, it’s just good IAM hygiene applied to an agent instead of a human user. The agent shouldn’t have more access than it needs, and the access it does have should be segmented and auditable.

Principle 3: Sandbox Execution for Cloud CLI Commands

Read-only IAM is fine for agents that use AWS SDKs and APIs. But what about agents that need to run arbitrary cloud CLI commands?

Our cost-chat interface allows users to investigate specific resources through natural language. When a user asks “what’s the CPU utilization on this instance over the last week?” the agent might decide to run an AWS CLI command to fetch CloudWatch metrics. This is where things get interesting from a security perspective, because now you have an LLM deciding what shell commands to execute.

We run these commands in an isolated sandbox: an ephemeral Modal container with pre-installed cloud CLIs, limited memory (512MB), limited CPU (0.25 vCPU), and a hard timeout of 180 seconds. The container has no persistent storage and no network access beyond the cloud provider APIs. Credentials are injected at runtime through the container platform’s secrets mechanism, never baked into the image.

Within the sandbox, we apply two layers of command validation:

Client-side validation happens before the command is sent to the sandbox. A set of regex patterns blocks dangerous CLI operations:

create, delete, terminate, stop, start, modify, update,
put, remove, deregister, release, revoke, attach, detach,
enable, disable, run-instances, reboot-instances, send-command

Dangerous flags are also blocked: --force, --yes, -y, --no-dry-run. This is a denylist approach, which means it’s incomplete by definition, but it catches the obvious destructive operations.

Server-side validation happens inside the sandbox. Arguments are matched against a regex allowlist that permits alphanumeric characters, common punctuation for JMESPath queries and JSON filters, but blocks shell metacharacters that could enable command injection. An API key is required to authenticate requests to the sandbox endpoint.

Neither layer alone is sufficient. The client-side validation might miss a new dangerous CLI subcommand. The server-side validation might have a regex gap. Together with the read-only credentials and the ephemeral container, they form a defense-in-depth stack where any single layer failing doesn’t result in a compromise.

Principle 4: Human-in-the-Loop for Anything Destructive

NIST emphasizes human-in-the-loop controls for high-risk actions, which is good advice, but the implementation matters more than the principle.

Our agents take a firm line: they never modify infrastructure. The cost agent produces recommendations. The compliance agent produces remediation commands with detailed instructions and pitfall warnings. But no agent has the ability to execute those recommendations. A human reviews the findings, decides what to act on, and runs the remediation themselves.

For the compliance agent, this means remediation outputs include explicit warnings about what could go wrong. A recommendation to restrict SSH access to a GCP firewall rule comes with notes about IAP requirements and the risk of locking yourself out. The agent knows enough to suggest the fix and enough to warn about the risks, but the actual execution is always manual.

This isn’t just about regulatory compliance or safety. It’s about trust calibration. If an agent occasionally produces a false positive (this resource is idle, but it’s actually a standby), the worst case is that a human sees it and dismisses it. If the agent had auto-remediation capabilities, the worst case is that it terminates a standby instance and causes an outage. The asymmetry between “bad recommendation” and “bad action” is enormous, and human-in-the-loop is how you stay on the safe side of that asymmetry.

Principle 5: Tool Call Observability

You can’t secure what you can’t see. NIST’s guidance on monitoring agent behavior for anomalies requires, at minimum, that you log every tool call with enough detail to reconstruct what happened.

Every tool invocation in our agents is wrapped with logging that captures the tool name, arguments, result summary, and execution duration. This creates a trace that you can audit after the fact: which accounts did the agent access? What data did it request? How long did each operation take? Were there any unexpected failures?

For the cloud CLI sandbox, the logging is even more detailed: the exact command, the cloud provider, the target account, the output (truncated for size), and the HTTP status code.

This observability data serves two purposes. First, it’s a security audit trail. If something goes wrong, you can reconstruct the agent’s actions step by step. Second, it’s an operational health signal. If the agent starts taking twice as long on a particular tool call, or if a particular account starts returning errors, you want to know about it before the next weekly run produces incomplete results.

The Pattern: Infrastructure Handles Security, Agents Handle Logic

The common thread across all of these patterns is that security belongs in the infrastructure layer, not in the agent’s reasoning. The agent doesn’t know it’s read-only. The agent doesn’t understand role chaining. The agent doesn’t validate its own tool calls. All of that is handled by the layers around the agent: IAM policies, sandbox containers, validation middleware, secrets management.

This maps directly to NIST’s principle that security controls should be architected into the system, not bolted onto the application. An agent’s system prompt might include safety instructions as defense-in-depth, but the actual security boundary is enforced by infrastructure that the LLM cannot reason its way around.

If you’re building agents that interact with production systems, start with the access controls and work inward. Read-only credentials first. Sandboxed execution for anything that runs commands. Comprehensive logging for everything. Human review for any action that modifies state. Then, and only then, think about what the system prompt should say about safety.

The LLM is the least reliable component in your security stack. Design accordingly.

Agent Router Enterprise enforces security at the infrastructure layer: AI Guardrails provide continuous input and output filtering, the MCP Gateway governs tool connectivity and access, and the LLM Gateway handles credential management and audit logging. Security that doesn’t depend on the LLM following instructions. Learn more here ›

Paul Merrison

April 21, 2026

Building AI agents

Agent Router Enterprise provides a managed AI Gateway, MCP Gateway, and AI Guardrails in your dedicated instance. Graduate agents from prototype to production with consistent model access, governed tool use, and runtime supervision — built on Envoy AI Gateway by its creators.

AI Gateway – Unified model catalog with automatic fallback across providers

MCP Gateway – Curated tool access with per-profile authentication and filtering

AI Guardrails – Enforce policies, prevent data loss, and supervise agent behavior

Learn more

Replacing NGINX Ingress

Tetrate Enterprise Gateway for Envoy (TEG) is the enterprise-ready replacement for NGINX Ingress Controller. Built on Envoy Gateway and the Kubernetes Gateway API, TEG delivers advanced traffic management, security, and observability without vendor lock-in.

100% upstream Envoy Gateway – CVE-protected builds

Kubernetes Gateway API native – Modern, portable, and extensible ingress

Enterprise-grade support – 24/7 production support from Envoy experts

Learn more

Announcing Envoy AI Gateway 1.0: A Stable Foundation for Enterprise AI Traffic

Agent Security: What NIST Wants You to Think About Before Your Agent Calls a Tool

The Fundamental Tension: Agents Need Access to Be Useful

Principle 1: Read-Only by Architecture, Not by Prompting

Principle 2: Credential Isolation Through Role Chaining

Principle 3: Sandbox Execution for Cloud CLI Commands

Principle 4: Human-in-the-Loop for Anything Destructive

Principle 5: Tool Call Observability

The Pattern: Infrastructure Handles Security, Agents Handle Logic

Building AI agents

Replacing NGINX Ingress

Ready to enhance your
network
with more
intelligence?

Announcing Envoy AI Gateway 1.0: A Stable Foundation for Enterprise AI Traffic

Agent Security: What NIST Wants You to Think About Before Your Agent Calls a Tool

The Fundamental Tension: Agents Need Access to Be Useful

Principle 1: Read-Only by Architecture, Not by Prompting

Principle 2: Credential Isolation Through Role Chaining

Principle 3: Sandbox Execution for Cloud CLI Commands

Principle 4: Human-in-the-Loop for Anything Destructive

Principle 5: Tool Call Observability

The Pattern: Infrastructure Handles Security, Agents Handle Logic

Building AI agents

Replacing NGINX Ingress

Ready to enhance your network with more intelligence?

Ready to enhance your
network
with more
intelligence?