There’s a point where an agent becomes too capable for its own interface. Once it can queue work, delegate to subagents, call tools in batches, and keep running in the background, the transcript stops feeling like a conversation and starts feeling like a packet capture.
That’s the problem both Gemini CLI and Codex seem to be attacking right now. Not with one giant feature launch, but with a pattern: build a layer between execution and the human, so the work reads less like machine exhaust and more like a story.
Gemini is making narration a product surface
The clearest signal comes from Gemini CLI. In the last couple of days, the project merged tool-based topic grouping—explicitly called chapters—alongside a topic narration UI, a generalized background task UI, and tab-to-queue support while generation is still in progress.
- PR #23150: tool-based topic grouping / chapters
- PR #24079: topic narration UI
- PR #22740: agnostic background task UI with completion behavior
- PR #24052: tab-to-queue while generating
The real tell is in the code. Gemini now has an update_topic tool. That’s not just metadata. It’s the runtime giving the model an explicit mechanism to say, “Here’s what I’m doing now, and here’s why this phase matters.”
update_topic in the first turn, the last turn, and whenever the topic changes. That means narration is no longer accidental. It’s policy.Even better, the scheduler sorts update_topic to the front of a batch, and the CLI renders TopicMessage separately from normal tool chatter. In other words: Gemini isn’t only tracking execution. It is styling intent.
Codex is cleaning up the language of action
Codex is pushing on the same problem from the opposite side. Instead of foregrounding “chapters,” it’s extracting the passive descriptions of tools and collaboration flows out of core runtime code and into a reusable codex-rs/tools layer.
Across a rapid sequence of PRs—#16047, #16129, #16132, #16138, and #16141—Codex moved ToolSpec, ConfiguredToolSpec, serialization helpers, local-host tool specs, and collaboration tool specs into a separate crate.
ToolSpecandConfiguredToolSpecnow live incodex-rs/tools/src/tool_spec.rscreate_tools_json_for_responses_api()now lives alongside those specs- Collaboration tools such as
spawn_agent,assign_task,list_agents, andrequest_user_inputare defined as typed specs inagent_tool.rsand related files
That sounds architectural—and it is—but the UX consequence matters. Once tool surfaces are cleanly described, runtimes and interfaces can present them more coherently. A multi-agent system stops being a bundle of handlers and starts becoming a legible protocol.
Why this matters more than another capability race
Most commentary on agent CLIs still treats the contest like a feature checklist: who has subagents, who has MCP, who has plugins, who can run shell commands, who can stay in the loop longest.
That framing is getting stale. The harder problem now is not whether the agent can act. It’s whether a human can remain oriented while it acts.
Gemini’s answer is narrative control: give the model a first-class way to publish chapters, strategic intent, and background task state. Codex’s answer is interface grammar: make the tool and collaboration surface structured enough that the UI can tell a cleaner story about delegation, waiting, interrupting, and resuming.
The deeper pattern: execution is being split from explanation
This is the part I find most interesting. Both projects are quietly acknowledging that raw execution traces are not a product. They’re substrate.
The product layer sits above that substrate and answers the human questions that actually matter:
- What phase is the agent in?
- Why did it switch direction?
- Which work is backgrounded versus blocking?
- Which agent owns which task?
- What changed between “thinking,” “waiting,” and “done”?
Gemini is solving that with explicit narrative instrumentation. Codex is solving it by making the semantics of action portable and typed. Different moves, same destination: an agent experience that reads like a guided workflow, not an unfiltered event log.
What to watch next
If this pattern continues, the next competitive layer in agent products won’t just be better models or broader tool access. It’ll be the story of work: chaptering, delegation summaries, interruption semantics, replayable background tasks, and UI conventions that make long-running agency feel trustworthy instead of opaque.
That’s a much bigger shift than it sounds. Once an agent can narrate its work well, it stops feeling like a black box with a terminal attached. It starts feeling like a collaborator with stage presence.
Open question: if agent tools keep getting stronger, will the biggest product advantage come from raw capability—or from who tells the clearest story about what the agent is doing and why?
Call to action: if you’re building agent UX, watch these narration layers closely—they may end up mattering more than the next headline benchmark.
Send a note to the desk
Corrections, missing context, or a follow-up lead.