The Approval Trap

Case file 07 / The self-restart loop

An agent could kill its own container, resume, and do it again

Hermes commit 54bf7987 documents a specific failure: with docker.sock mounted, the agent could run docker restart hermes, terminate its own container, restart under Docker policy, resume the same session, and repeat the command. The fix added Docker lifecycle commands to dangerous-command patterns.

The surprise is not that a dangerous command existed. It is that the approval boundary missed a consequential lifecycle transition while less important actions could still generate prompts. Friction was present, but it was placed around the wrong changes in authority.

Latest newsroom receipts

What changed since the first Atlas draft

NousResearch/hermes-agent Hermes Makes Agent Approvals Thread-Local

Thread-local approval state narrows an old class of shared-flag failures.

google-gemini/gemini-cli, openclaw/openclaw, NousResearch/hermes-agent Agent Safety Gates Catch Edge Cases

Safety gates become meaningful where edge cases are tested across projects.

openclaw/openclaw OpenClaw Narrows Who Can Wake An Agent

Wake controls turn scheduled autonomy into a permission boundary.

Mechanism trace

What actually happens

Proposed action
exact arguments and effect
Classify risk
authority change
Check grants
scope, inherit, expiry
Human decision
approve, deny, timeout
Bound executor
approved principal
Environment effect
commit or reject
Audit receipt
decision plus outcome

Fatigue begins with poor classification

Crush commit 96728b15 detects shell chaining. A safe-listed read can skip approval, but pipes, semicolons, command substitution, and backticks remove that shortcut. The mechanism catches the moment a simple read becomes shell composition.

That conservative boundary can also prompt on harmless composition. The answer is not to ask everywhere or nowhere. The system must reveal the scope change that caused the interruption, then measure whether the user received enough information to decide.

Bypass states must be coherent

Crush commit 6b312bee replaces shared skip state with atomic state and protects session auto-approval with a mutex. Permission requests can return immediately when skip mode, an allowlist, hook approval, session auto-approval, or a persistent grant applies.

Approval fatigue makes bypasses attractive. That makes concurrency correctness and visible state essential: the interface and runtime must agree about whether prompts are enabled, which bypass granted an action, and how long that grant lasts.

Approval must travel with the work

Hermes commit 10839772 propagates approval context into local and remote execute_code worker threads. Before the fix, gateway sessions could lose approval context and fall into a non-interactive auto-approve branch. The patch adds a fail-closed check before the child process starts.

A prompt is useless if the request cannot reach the human from the worker that needs it. The safer design can combine findings into one request, block the worker while awaiting a decision, treat timeouts as non-consent, and keep the decision tied to the session.

Remembered permission trades repetition for blast radius

Remembering narrowly scoped paths or network allowances can reduce repeated questioning without granting unrestricted access. But the stored scope must remain inspectable.

A grant should identify the command, path, network surface, session, inheritance rules, and expiry. Otherwise the system reduces fatigue by quietly widening trust, leaving the user unable to tell why a later action proceeded.

Failure modes

Where the contract breaks

Click-through approval

Repeated prompts expose no meaningful difference in scope or consequence.

Failure signalVery fast approvals, repeated identical requests, or frequent switches to bypass modes.

Context-loss auto-approval

A worker or gateway loses interactive session context and takes a non-interactive path.

Failure signalAn action prompts in the main thread but proceeds silently when delegated.

Remembered grant becomes hidden authority

A persistent approval outlives the action the user believed they approved.

Failure signalLater work proceeds without showing which stored grant authorized it.

Release tests

What the product must prove

Compare the displayed request with the normalized action actually executed, including composition and lifecycle effects.
Trace approval context through worker threads, gateways, children, retries, and resumes.
Inventory every skip, allowlist, hook, session, and persistent-grant branch, including its audit output.