The Next Agent Battle Isn’t Subagents. It’s Delegation Quality.

Last week, “multi-agent” still sounded a bit like theater.

Spawn some helpers. Give them names. Maybe let them fan out in parallel. Add a nice thread picker in the UI.

But the code landing across agent runtimes is pushing the story somewhere more serious.

The new work is not just about having subagents. It is about operating delegation like a live system.

First, the runtime has to decide whether a task should be delegated at all.

Then it has to pick the right specialist, hand the work off cleanly, and recover when the after-effects get messy.

Across Gemini CLI, Codex, and OpenClaw, delegation is starting to look less like a feature checkbox and more like an operations discipline.

Gemini CLI is testing delegation like a behavior, not a capability

The clearest signal comes from Google’s Gemini CLI.

A fresh commit surfaced by gsio — d9d2ce36f2a7, titled test(evals): add comprehensive subagent delegation evaluations — points to a new emphasis inside evals/subagents.eval.ts. The file does not merely test whether subagents exist. It tests whether the outer agent delegates well.

At lines 38 through 45, the comments spell out the goal: check whether the outer agent reliably uses an expert subagent even when the prompt only indirectly implies the need. In other words, Gemini is probing judgment, not syntax matching.

Then the file flips the test around. At lines 67 through 70, it explicitly checks that trivial direct-edit work is not over-delegated. That is a subtle but important maturity marker. Once a runtime can spawn specialists, one failure mode is under-delegation. Another is orchestration mania — sending simple work through a specialist pipeline just because the machinery exists.

And the test suite keeps climbing. At lines 191 through 225, Gemini checks whether the main agent can pick the correct specialist from a pool of ten different agents. That is not “subagents are available.” That is dispatch quality under menu pressure.

Gemini’s shift is simple and revealing: delegation is no longer a binary feature. It is something the runtime is now willing to grade.

The public docs reinforce the same product direction. Gemini’s subagent docs frame specialists as a way to keep the main session from getting cluttered while letting the primary agent “hire” focused helpers with their own prompts, tools, and context windows. But the new eval work shows where the frontier is moving: once you can hire specialists, the next problem is making sure the manager hires the right one, at the right time, for the right amount of work.

Codex is separating “wake the worker” from “leave a note”

OpenAI’s Codex is tackling the same problem from a different layer.

In the multi-agent v2 tool stack, Codex is hardening the protocol around delegated work. The file codex-rs/core/src/tools/handlers/multi_agents_v2.rs exports distinct handlers for AssignTaskHandler, SendMessageHandler, and ListAgentsHandler. That is already a clue: the runtime is no longer treating child-agent interaction as one generic “talk to another agent” action.

The sharper distinction shows up one layer lower.

In assign_task.rs, line 21 routes through MessageDeliveryMode::TriggerTurn. In send_message.rs, line 21 routes through MessageDeliveryMode::QueueOnly. Same family of tool. Different operational contract.

That difference matters more than it looks.

assign_task says: wake the worker and make this live work.

send_message says: attach context, but do not necessarily start a fresh turn.

That is the kind of distinction systems add when delegation stops being a novelty and starts becoming something that can be overloaded, mis-timed, or left hanging. The editorial point is simple: delegated work needs explicit operational contracts.

The adjacent tool registration in codex-rs/core/src/tools/spec.rs makes the shape even clearer: send_message, assign_task, and list_agents are all first-class registered surfaces. Codex is making agent inventory and handoff semantics explicit.

The public Codex docs tell the same story from the operator side. The product now documents spawning subagents, routing follow-up instructions, waiting for results, closing threads, inheriting sandbox policy, and even surfacing approvals from inactive threads. That is not just “we support subagents.” That is workflow governance.

OpenClaw is building the control room around delegated work

OpenClaw shows what happens when a runtime keeps following delegation all the way down into delivery and lifecycle management.

Start with the operator surface. In src/agents/tools/subagents-tool.ts, the subagents tool exposes list, kill, and steer. The runtime is not just launching children and hoping for the best. It is assuming that delegated runs will sometimes need supervision.

Then look at the delivery mechanics. In src/agents/subagent-announce-dispatch.ts, delivery paths are explicitly tracked as queued, steered, direct, or none. Dispatch phases distinguish queue-primary, direct-primary, and queue-fallback. That is the vocabulary of a runtime that expects delivery to branch, degrade, and recover.

The queue layer goes further. In src/agents/subagent-announce-queue.ts, OpenClaw keeps debounce settings, capacity caps, drop policy, summary lines, and exponential backoff for drain failures. Delegated work does not just finish; it has to come back into the human-facing lane without spamming or collapsing under burst pressure.

And the registry layer is even more revealing. In src/agents/subagent-registry.types.ts, each run record carries controller and requester links, runtime, retry counts, lifecycle reason, descendant-settle wakeups, and frozen completion text preserved for later delivery. That is runtime supervision, not just feature gloss.

Put differently: OpenClaw is not treating delegation as a moment. It is treating it as a lifecycle.

The bigger pattern: subagents are becoming measurable workers

Line the repos up, and a broader convergence starts to show.

Gemini CLI asks whether the parent delegates appropriately, avoids over-delegation, and picks the right specialist from a crowded field.
Codex formalizes different handoff semantics — assign work, queue a message, inspect active agents — instead of bundling everything into one vague inter-agent channel.
OpenClaw builds the runtime furniture needed once delegation produces real operational mess: queued returns, fallback delivery, steering, retries, descendant coordination, and persistent run records.

Three different stacks, same realization: delegated work has to be measured and governed, not just enabled.

The industry is moving from “subagents exist” to “delegation has quality metrics, control surfaces, and failure modes.”

That shift also helps explain why the docs and community commentary have changed tone so quickly. Simon Willison noted in mid-March that subagents are now widely supported across agent products. Once that pattern goes mainstream, the competitive question changes. Availability is table stakes. Reliability is the story.

Can the system avoid handing easy work to a specialist just because it can? Can it distinguish between queuing context and triggering active work? Can it surface approvals from the right child thread? Can it retry delivery without becoming noisy? Can a human step in and steer a worker before the chain goes sideways?

Those are not UX details anymore. They are the guts of whether multi-agent work feels useful or flaky.

Why this matters now

There is a simple trap in agent discourse right now: we talk about subagents like adding more bodies to the room automatically makes the system smarter.

But more workers mostly create more coordination debt.

That is why this new code matters. The interesting teams are finally paying the coordination bill in public. They are writing evals for delegation judgment. They are distinguishing task triggers from queued notes. They are exposing list/kill/steer controls. They are tracking retries, delivery paths, and stalled descendants.

In other words, they are turning subagents from a parlor trick into something closer to runtime operations.

Open question: as subagents become standard across coding tools, will the real product moat come from having more specialists — or from having the clearest, most trustworthy operating model for when delegation happens, how it is supervised, and how failures are recovered?

Call to action: if you are evaluating agent platforms, do not stop at “supports subagents.” Inspect the handoff. Look for delegation evals, explicit delivery modes, operator controls, retry logic, and lifecycle records. In the next phase of agent tooling, the winners may be the systems that manage delegated work like operations — not the ones that merely market it like magic.

Source anchors

google-gemini/gemini-cli — evals/subagents.eval.ts; gsio commit anchors d9d2ce36f2a7, 57a66f5f0db1; subagent docs at geminicli.com/docs/core/subagents
openai/codex — codex-rs/core/src/tools/handlers/multi_agents_v2.rs, assign_task.rs, send_message.rs, codex-rs/core/src/tools/spec.rs; gsio commit anchor 773fbf56a43a; Codex subagent docs
openclaw/openclaw — src/agents/tools/subagents-tool.ts, src/agents/subagent-announce-dispatch.ts, src/agents/subagent-announce-queue.ts, src/agents/subagent-registry.types.ts; gsio commit anchors b75be0914491, 96c77025263d

The Next Agent Battle Isn’t Subagents. It’s Delegation Quality.

Gemini CLI is testing delegation like a behavior, not a capability

Codex is separating “wake the worker” from “leave a note”

OpenClaw is building the control room around delegated work

The bigger pattern: subagents are becoming measurable workers

Why this matters now

Source anchors

Receipts below the story

Source Trail

Evidence Limits

Send a note to the desk

The Next Agent Battle Isn’t Subagents. It’s Delegation Quality.

Gemini CLI is testing delegation like a behavior, not a capability

Codex is separating “wake the worker” from “leave a note”

OpenClaw is building the control room around delegated work

The bigger pattern: subagents are becoming measurable workers

Why this matters now

Source anchors

Receipts below the story

Source Trail

Evidence Limits

Atlas Context

Delegation Is a Distributed System

Send a note to the desk

Same Edition

The CLI Is Becoming an Agent Workbench