The Real Agent Feature Is Not Losing the Plot

Every agent demo looks smart in the first five minutes.

The real test starts around minute forty.

That is when the context window swells, tool logs pile up, a side quest interrupts the main task, and the human comes back asking the one question every serious agent needs to survive: where were we?

The new product battle in terminal agents is not just what they can do. It is whether they can keep the thread when work gets long, messy, and resumable.

Gemini is making context management visible

Google’s gemini-cli has stopped treating continuity as an invisible implementation detail.

The docs now expose a literal model.compressionThreshold, with a default of 0.5. That sounds small, but it is a big product tell. Once compression gets a user-facing knob, context handling is no longer hidden plumbing. It becomes part of the workflow contract.

The surrounding changes all push in the same direction. Gemini’s latest changelog says Plan Mode gained built-in research subagents, annotation support, and a copy subcommand. The preview changelog goes further: Plan Mode is enabled by default, approved plans survive chat compression, tracker CRUD and visualization landed, and even /compact was added as a friendlier alias for /compress.

That is a subtle but important shift. Compression is not framed as an emergency brake anymore. It is becoming ordinary operating behavior — something the product expects, shapes, and tries to make safe for planning state.

Gemini’s docs make the pattern even clearer. Rewind is documented as working across chat compression points by reconstructing history from stored session data. Session cleanup is also explicit about what counts as state: not just chat, but implementation plans, task trackers, tool outputs, and activity logs. In other words, Gemini is quietly defining agent work as a bundle of artifacts that must stay coherent together.

Codex is hardening the session shell around the agent

OpenAI’s codex is reaching for the same user outcome from a different angle.

The big March 26 release is full of runtime durability clues. The changelog says prompt history recall now works in the app-server TUI, including across sessions. It also calls out fixes so the app-server TUI preserves transcript text instead of dropping it under backpressure. That is not glamorous, but it is exactly the kind of thing users notice when an agent starts feeling reliable instead of theatrical.

The deeper runtime layers back that up. Codex’s app-server README describes bounded queues between ingress, processing, and outbound writes. When the system saturates, it returns a retryable JSON-RPC overload error instead of pretending everything is fine. In the unified exec watcher, the runtime explicitly prefers the accumulated transcript as the source of aggregated command output, falling back only when needed.

Put differently: Codex is building continuity into the transport and session shell. Gemini makes compression and planning state explicit. Codex makes transcript survival, history recall, and runtime retry behavior harder to lose by accident.

This is what serious agent UX looks like

There is a temptation to call all of this “quality of life.” I think that undersells it.

Long-running agent work dies in boring ways:

the plan gets compressed out of relevance,
the transcript drops key context,
the session resumes without enough state,
the human has to restitch the whole narrative by hand.

That is why these changes matter. They are not just convenience features. They are attempts to reduce a specific tax: the cognitive overhead of recovering a half-finished job.

And that may be the most important product battle in agent tooling right now. Benchmark wins are flashy. But in real work, the winning tool is often the one that makes interruption cheap.

The deeper contrast

What I like about this moment is that the two repos reveal different philosophies.

Gemini is more explicit and policy-shaped. It exposes compression thresholds, documents rewind across compression points, keeps approved plans during compression, and defines retention around a cluster of operational artifacts.

Codex is more runtime-shaped. It strengthens the app-server path, keeps prompt history reachable across sessions, adds overload semantics instead of silent collapse, and treats transcript capture as a first-class output path.

Different routes, same market truth: the agent that wins will not just be the one that can act. It will be the one that can resume.

What to watch next

If this trend keeps going, agent products will start looking less like single-shot assistants and more like continuity systems.

Not “Can it use tools?”

More like:

Can it compress without losing the plan?
Can it rewind without scrambling the story?
Can it resume without making the human repeat themselves?
Can it fail loudly enough that state stays trustworthy?

That is where trust gets built. Not in the first answer, but in the fifth handoff.

So here’s the open question: as agents take on longer, stop-start work, will continuity features matter more to users than raw model cleverness?

If you build agent tools, watch the places where sessions bend but should not break — then make those moments easier to recover, inspect, and continue.

The Real Agent Feature Is Not Losing the Plot

Gemini is making context management visible

Codex is hardening the session shell around the agent

This is what serious agent UX looks like

The deeper contrast

What to watch next

Receipts below the story

Source Trail

Evidence Limits

Send a note to the desk

The Real Agent Feature Is Not Losing the Plot

Gemini is making context management visible

Codex is hardening the session shell around the agent

This is what serious agent UX looks like

The deeper contrast

What to watch next

Receipts below the story

Source Trail

Evidence Limits

Atlas Context

Memory Is a Typed Storage Contract

Send a note to the desk

Same Edition

AI Agent Summaries Are Becoming Infrastructure