Every agent product eventually has the same conversation: who is allowed to click “yes”?
The simplest answer is a single per-action prompt. The user gets a popup, the popup says “the agent wants to do X”, the user clicks yes or no, the run continues. That model breaks down fast in real operations. Three reasons come up over and over:
- The user cannot evaluate every action. A long-running research or coding agent can produce hundreds of low-risk actions and a handful of side effects. Either the user approves everything in bulk (no real safety) or they get fatigued and rubber-stamp the rest.
- The same action has different blast radius in different contexts. “Send a message” might be safe in a sandbox and dangerous against a real customer. The runtime usually knows which; the user usually does not.
- The interesting failures are not at the prompt boundary. They are at the edges: retries, recoveries, fallbacks, escalations. A single confirm button has no opinion about what happens between the prompt and the side effect.
A better model is tiered approval escalation. Each proposed action is classified into a tier at the tool-call boundary, and the tier decides who is asked and how often.
A Working Tiered Escalation Policy
- Tier 0 (read-only, scoped): no prompt, no receipt beyond the run log. Examples: file read inside the project root, search query against an internal index, snapshot of a sandbox URL.
- Tier 1 (reversible internal state change): no prompt, but a durable receipt and an undo token. Examples: create a local branch, draft a doc, populate a workspace cache.
- Tier 2 (bounded external write): prompt the user once per session for that class of action, with a clear scope. The agent can issue five, fifty, or five hundred writes under that class without re-asking, but the user knows what class was authorized. Examples: create a PR, send a message to a test channel, post a comment to a specific thread.
- Tier 3 (irreversible or high-blast-radius): prompt every time, with the resolved target, the policy decision, and the verification artifact, and require a fresh human approval. Examples: send to a real customer, merge to main, delete a record, change a credential, post a public message under a real account.
The important shift is that the classification happens in the runtime, not in the prompt. The agent cannot “decide” a Tier 3 action is really Tier 1 by being persuasive; the gateway sees the resolved target, the policy, the prior approvals, and either allows it, demands a prompt, or blocks it.
Why the Tier Matters More Than the Prompt Text
The actual cost of “click yes” fatigue is not the user’s time. It is the silent downgrade of the audit trail. If a user approves a Tier 2 class once and then fifty writes happen, the system still has a real receipt for each write and a real reason it was allowed. If a user approves a “yes to everything” prompt, the system has nothing to show after the fact except that the user said yes once.
Practical Implementation
A practical implementation has four moving parts:
- A tool registry that knows the tier of every action. Not the agent’s description of the action — the gateway’s classification of the resolved call.
- A session-level scope object. Once the user approves a Tier 2 class, the scope travels with the run. The next Tier 2 write in the same class checks the scope and proceeds without re-asking.
- A per-call receipt. Every action, at every tier, produces a small structured record: tier, resolved target, policy decision, approval reference, verification artifact. Tier 0 and Tier 1 receipts are batched into the run log; Tier 2 and Tier 3 receipts are first-class.
- An escalation path. If a Tier 1 action suddenly needs to become Tier 2 (e.g. the agent tries to write outside the approved scope), the runtime pauses, re-classifies, and asks. The prompt is not “do you want to continue?”; the prompt is “this action just got re-tiered, here is why, approve the new tier or stop.”
What Tiered Escalation Is Not
Tiered escalation is not a substitute for a guard. A guard scans the contents of the action for prompt injection, credential leaks, exfiltration patterns, and safety bypasses. The tier decides whether the human is asked; the guard decides whether the action runs at all. A Tier 0 read can still be blocked by the guard. A Tier 3 send that passes the guard can still be paused for human approval.
What to Watch for in Production
Three failure modes show up early:
- Tier inflation. The agent learns that Tier 1 is easy and routes borderline actions through it. The fix is to require the runtime, not the agent, to pick the tier.
- Scope drift. A Tier 2 class broadens over the session until it covers most writes. The fix is a per-class scope object that names the target type, the destination, and the count, not a generic “writes are allowed.”
- Receipt underproduction. Operators forget to wire Tier 0 and Tier 1 receipts into durable storage, so when something goes wrong the only record is the run log. The fix is to make the tier classification and the receipt emission happen in the same place.
A single confirm button is fine for a demo and wrong for production. Tiered escalation at the tool-call boundary, with a guard upstream and a durable receipt downstream, is what makes a long-running agent safe to operate without making the user the bottleneck.