Subagents Aren’t Just Getting Smarter. They’re Getting Contained.

For a while, agent demos have sold the same fantasy.

More subagents. More parallelism. More little workers fanning out across a repo like interns after too much cold brew.

Fun demo. Terrifying production model.

What the code says now is sharper: the real race is no longer just to spawn more helpers. It’s to make those helpers safe to run.

That shift is visible in both Gemini CLI and Codex. And it matters because delegation stops feeling magical the minute a subagent leaks state, asks the wrong human question, collides with another worker, or leaves cleanup debris behind.

Gemini is putting subagents in real side rooms

The freshest Gemini signal is PR #23903, “subagent isolation and cleanup hardening.” This is not cosmetic polish.

According to the merged PR, Gemini now stores subagent session files in directories keyed to the full parent session UUID, refactors deletion into a shared async utility, and hardens path sanitization to prevent traversal and file leaks.

That sounds mundane until you translate it into product language: subagents are no longer treated like casual extensions of the main run. They are being handled like workloads that need their own storage boundaries and predictable teardown.

There is a second clue in the same cluster. PR #22115, “Fully migrate packages/core to AgentLoopContext,” spreads a shared runtime context model across scheduling, prompts, safety, shell execution, loop detection, and even tools like web search and fetch.

That is the kind of plumbing you build when agent behavior needs consistent context, not just clever prompting. It is infrastructure for control.

Gemini is also teaching delegated runs how to fail more cleanly

Another recent Gemini change pushes the same direction from a different angle.

Commit 5fa14dbe4283, “resilient subagent tool rejection with contextual feedback,” upgrades the subagent failure path so a rejected tool call can come back with direct instructions to acknowledge the rejection and rethink the strategy.

That matters because immature delegation stacks tend to do one of two dumb things when blocked: either they crash noisily, or they keep pushing the same bad plan.

Gemini is starting to do something more adult. When a subagent gets denied, the runtime gives it a clearer frame for recovery.

Even the default budgets are getting more realistic. In packages/core/src/agents/types.ts, Gemini doubled the default subagent limits from 15 to 30 turns and from 5 to 10 minutes. That is not just generosity. It is a signal that delegated work is being tuned as an execution unit with explicit bounds.

Codex is trimming away the human dependency inside subagents

Codex is approaching the same problem from a slightly different philosophy.

Commit bda3c49dc4a4, “disable request input on sub agent,” makes the stance brutally clear in codex-rs/core/src/tools/spec.rs: if the session source is a subagent, request_user_input does not get registered.

That is a strong product opinion. A delegated worker should not become a little chaos portal back to the human. If the main agent chooses to delegate, the child run should stay bounded and finish within its lane instead of pinging the user midstream.

Then Codex tightens the social contract among workers. Commit 932ff2818320, “better multi-agent prompt,” rewrites the spawn-agent guidance to emphasize explicit ownership, disjoint write scopes, and a clear instruction not to revert other workers’ edits.

That is less “wow, look at the swarm” and more “somebody finally wrote the badge rules for the machine room.”

Codex is adding the gauges too

Containment is not only about blocking bad behavior. It is also about measuring what happened inside the box.

Commit e07eaff0d32c, “add metric for per-turn tool count and add tmp_mem flag,” increments tool-call counts inside the active turn state and emits a codex.turn.tool.call histogram at turn completion. Nearby work tracks per-turn token usage too.

That makes delegated work easier to reason about as an operational object: how busy was the turn, how expensive was it, and under what memory mode did it run?

Even Codex’s newest merged cleanup fits the pattern. PR #15906 removes dead skillMetadata from command approval payloads across app-server, TUI, and MCP boundaries. Smaller contract. Less protocol baggage. Fewer stray surfaces.

Again: not flashy. Very telling.

The bigger pattern: delegation is turning into operations

Put the two repos together and the picture gets clearer.

Gemini CLI is hardening isolation, cleanup, shared loop context, rejection recovery, and bounded runtime defaults.
Codex is suppressing human interruptions inside subagents, clarifying worker ownership, and instrumenting turns like measurable jobs.

Different implementation style. Same instinct.

These teams are acting less like they are building magical helper creatures and more like they are running a small distributed system inside the terminal.

That is the important shift.

The first generation of agent tooling asked, can we delegate?

The next generation is asking, what boundaries make delegation trustworthy?

Why this matters

In real software work, the danger is rarely that an agent cannot do enough.

The danger is that it does enough in the wrong place.

Wrong files. Wrong cleanup. Wrong escalation path. Wrong interruption model.

That is why this week’s code matters more than it looks. The platforms that win long-term may not be the ones with the most dramatic multi-agent demos. They may be the ones that make delegated work feel boxed, inspectable, recoverable, and boring in the best possible way.

So here’s the open question: if subagents start looking more like isolated jobs than free-roaming helpers, does that make teams more willing to trust them with larger chunks of real work?

If you build agent tools, don’t just add more workers. Add better boundaries. Make ownership explicit, cleanup reliable, approvals slimmer, and failures easier to recover from. That is how delegation stops being a stunt and starts becoming infrastructure.

Subagents Aren’t Just Getting Smarter. They’re Getting Contained.

Gemini is putting subagents in real side rooms

Gemini is also teaching delegated runs how to fail more cleanly

Codex is trimming away the human dependency inside subagents

Codex is adding the gauges too

The bigger pattern: delegation is turning into operations

Why this matters

Receipts below the story

Source Trail

Evidence Limits

Send a note to the desk

Subagents Aren’t Just Getting Smarter. They’re Getting Contained.

Gemini is putting subagents in real side rooms

Gemini is also teaching delegated runs how to fail more cleanly

Codex is trimming away the human dependency inside subagents

Codex is adding the gauges too

The bigger pattern: delegation is turning into operations

Why this matters

Receipts below the story

Source Trail

Evidence Limits

Atlas Context

Delegation Is a Distributed System

Send a note to the desk

Same Edition

AI Coding Agents Are Turning Approval Settings Into Operating Modes