Agent demos still love the same flex.
More tools. More subagents. More context. More speed.
But that is not where the sharpest engineering is showing up right now.
The real fight is turning into flow control: when a user interrupts, queues, resumes, confirms, compresses, or changes course mid-run, does the system stay graceful—or turn into soup?
That shift is showing up clearly in both Codex and Gemini CLI. And it matters because the moment agents stop being toy one-shot assistants and start acting like persistent workers, the terminal becomes less like a prompt box and more like an air-traffic tower.
Codex is exposing the pressure points
Codex now says the quiet part out loud in its own TUI docs.
In docs/tui-chat-composer.md, when steer mode is enabled, Tab does not always just send the message. If a task is already running, it requests queuing instead. That is not a tiny UX flourish. It is an admission that human intervention during active work needs an explicit lane.
The same pattern is showing up deeper in the runtime.
PR #16062, “stabilize zsh-fork approvals and resume --last,” is really a story about operational timing. The patch keeps approval behavior stable across a macOS shell handoff, expands the timeout budget for startup approvals, and narrows resume --last so the CLI does not accidentally pick a newer internal sub-agent thread over the user-facing one.
That is a backpressure fix. The problem is no longer just whether resume exists. It is whether resume still points to the right conversation when multiple hidden execution paths are in play.
Then there is compaction. Codex’s compact.rs makes it obvious that context pressure is now part of live runtime behavior, not just a background implementation detail. Mid-turn compaction has to reinsert initial context before the last real user message, trim older thread items when the prompt gets too large, and warn the user that long threads plus repeated compactions can make the model less accurate.
That warning matters. It means the system is no longer pretending context is infinite. It is teaching the user that throughput and fidelity are now linked.
Fresh issue traffic pushes the same story even further. Issue #16068 describes a case where custom model_context_window values can poison token accounting after overflow and stop auto-compaction from recovering properly. Issue #16060 asks for a SIGUSR1-powered inbox so an external orchestrator can inject instructions at the next safe point instead of killing the process.
Again, the center of gravity is clear: not “what new trick can the agent do?” but “how does a busy agent accept, delay, or recover from new pressure?”
Gemini CLI is already acting like a traffic controller
Gemini CLI is approaching the same problem from a more visibly event-driven architecture.
In packages/a2a-server/src/agent/executor.ts, if a task is already executing, the executor can process a new user message in a secondary execution loop, then yield back to the primary one. That is a very specific runtime opinion: new instructions do not always have to explode the current turn. They can be admitted, processed, and folded into an active system.
The pending-work model goes deeper in task.ts. Tool calls are pre-registered before async scheduling, pending tool counts are tracked explicitly, the executor waits for pending tools to complete before continuing, and the task can remain in a working state even when the latest user message contains only confirmations.
Translated out of TypeScript: Gemini is not treating busy-ness as a vague vibe. It is modeling it.
Even the UI admits the pressure. In the keyboard shortcuts docs, plan mode is skipped when the agent is busy. The system is effectively saying: some operator actions are valid only when the lane is clear.
Compression is another giveaway. Gemini exposes /compress as a first-class command, makes the trigger configurable with model.compressionThreshold, and even provides hooks.PreCompress so operators can back up or inspect the conversation before history gets summarized.
That is not “chat” design anymore. That is lifecycle design.
And the freshest user request lands exactly on this seam. Issue #24071 asks for the ability to queue a message while compression is still running, because right now the user has to wait at the desk until the summary finishes. The complaint in issue #24064 is broader and angrier, but it rhymes: if the runtime feels sluggish or unresponsive under load, trust collapses fast.
The big pattern: terminal agents are becoming backpressure systems
Put the two repos side by side and the pattern snaps into focus.
- Codex is refining queueing, approval timing, resume semantics, and compaction recovery.
- Gemini CLI is formalizing secondary execution, pending-work accounting, busy-state UI rules, and operator-visible compression control.
Different implementation style. Same direction.
Both projects are discovering that persistent agents do not fail only because they are too weak. They fail because the control surface gets messy when too many things want to happen at once.
Human asks a follow-up while tools are still running.
Compression starts right when the next instruction arrives.
A resume target collides with internal helper threads.
An approval prompt steals budget from the actual task.
Those are not edge cases anymore. Those are product-defining moments.
Why this matters
For the next generation of coding agents, raw intelligence may stop being the clearest differentiator.
The sharper moat may be whether the tool feels composed under pressure.
Can it queue instead of thrash? Can it compress without trapping the user? Can it resume the right thing? Can it accept new intent at safe points instead of forcing a restart?
That is the kind of boring-sounding engineering that decides whether an agent feels trustworthy enough to leave running.
So here’s the open question: if terminal agents get dramatically better at handling queueing, compaction, and mid-task steering, does that unlock longer-lived workflows—or just make people tolerate more complexity?
If you build agent tools, don’t just chase smarter models or bigger context windows. Design the pressure valves. Make busy states legible, queueing intentional, resumes predictable, and compression humane. That is how an agent stops feeling like a demo and starts feeling like infrastructure.
Send a note to the desk
Corrections, missing context, or a follow-up lead.