AI Coding Agents Have a New Bottleneck: Keeping Work Intact

For a while, the easy way to think about AI coding agents was simple: bigger models, longer context, more tools, more autonomy.

That story is getting less convincing by the week.

The messier reality of real agent work is not just about raw intelligence. It is about continuity. Can a long task survive context compression? Can a running session be resumed without losing its identity? Can an interrupted subagent pick up the thread without repeating dangerous work? Can project memory stay scoped to the right boundary instead of bleeding across directories?

Across several agent repos, those questions are no longer being handled as glue code. They are starting to show up as explicit runtime design work.

In these examples, a new layer is coming into view: continuity infrastructure.

Gemini CLI starts treating history like an active subsystem

Google’s Gemini CLI makes the shift unusually explicit. In packages/core/src/services/agentHistoryProvider.ts, the new AgentHistoryProvider does not merely store transcripts. It actively evaluates the chat history, decides when it should be truncated, summarizes the discarded portion, and merges that continuity summary back into the live conversation.

The important detail is what the summary is for. The prompt inside the same file asks for an agent-continuity focused intent summary that preserves the original mandate, the agent’s strategy, and important shifts in approach. That is not generic summarization for convenience. It is continuity compression designed to keep a running task coherent after earlier turns have been trimmed away.

Then Gemini wires that behavior into a dedicated model configuration. In packages/core/src/config/defaultModelConfigs.ts, there is a specific agent-history-provider-summarizer config rather than some ad hoc reuse of a general model alias. That suggests the team is treating history compression as a distinct runtime concern.

There is a second continuity clue nearby. In packages/core/src/utils/memoryDiscovery.ts, Gemini adds configurable boundaryMarkers for finding project roots and scoping memory discovery. In plain English: the system is getting more explicit about where project-scoped memory lookup should stop.

Bigger context windows help, but this code points to a different truth: once an agent is expected to work for a while, continuity has to be engineered. It cannot be left to chance.

Codex is turning resume into protocol work

OpenAI’s Codex shows the same trend from another angle. In codex-rs/app-server-protocol/src/protocol/thread_history.rs, Codex rebuilds turns from persisted rollout items so resumed or rebuilt history preserves original turn identifiers. The file also uses a shared reducer for both persisted rollout replay and in-memory current-turn tracking during resume and rejoin flows.

That sounds low-level, but it matters. Once sessions can be interrupted, resumed, or reattached from another client, continuity stops being a UX detail. It becomes a protocol problem. The system needs to know what the thread actually was, what items belonged to which turn, and how to reconstruct that state without introducing ghosts, duplication, or missing steps.

Codex’s follow-up fix in codex-rs/tui_app_server/src/lib.rs makes the product stakes even clearer. A regression test ensures codex resume <name> still finds the correct saved thread even when the rollout title and stored session name do not match. That is the kind of bug you only care deeply about when session identity itself has become part of the product contract.

In other words: Codex is not just helping users reopen old work. It is investing in the machinery required to reconstruct that thread more reliably when work is resumed.

OpenClaw is building restart recovery into the runtime

OpenClaw pushes the pattern into operations. In src/agents/subagent-orphan-recovery.ts, the system looks for subagent sessions orphaned by a gateway reload, then builds a synthetic resume message containing the original task and the last user message before interruption.

That is already more sophisticated than simple “try again” logic. But the sharper detail is the idempotency thinking around it. The recovery path can attach a config-change hint telling the resumed agent not to re-edit openclaw.json or restart the gateway if those changes were already applied before the interruption.

That is continuity engineering in the most practical sense: passing forward enough context to help the resumed agent avoid repeating config edits or restarts that may already have happened.

The adjacent cleanup in src/agents/cli-runner.ts completes the picture by proactively killing stale resume processes. Again, this is not glamorous model work. It is runtime hygiene for long-running agent jobs that may pause, resume, and collide with leftovers.

The broader pattern: continuity is becoming a real differentiator

Seen together, these repos point to a shared realization. Bigger context still matters, but it is no longer the whole story. The next layer looks more like systems engineering.

Agents now need machinery for:

compressing old work without losing the thread,
scoping memory to the right project boundary,
rebuilding thread history from durable artifacts,
recovering interrupted runs after restarts or reconnects,
and preserving session identity even when UI labels drift.

And this is not staying buried in implementation details. Gemini CLI’s repo now explicitly advertises conversation checkpointing for saving and resuming complex sessions. OpenAI’s Codex launch page leans on long-running tasks, real-time progress visibility, and verifiable evidence through logs and test outputs. Both are signs of the same market pressure: these tools are no longer being sold as one-shot answer engines. They are being sold as work surfaces.

Why this matters for users

Most people will never read thread_history.rs or agentHistoryProvider.ts. But they will feel the difference.

They will notice when an agent can come back after a restart and still know what it was doing. They will notice when a resumed session does not duplicate edits. They will notice when memory stays tied to the right project instead of dragging irrelevant baggage into the next task. They will notice when a long conversation still feels coherent after the earlier parts are compressed.

That is the user-facing value of continuity infrastructure: not magic, but fewer broken threads.

The tension: continuity can become bureaucracy

There is a tradeoff hiding here too. Every new continuity layer adds resilience, but also complexity. Summaries can distort intent. Boundary markers can be misconfigured. Resume logic can reattach the wrong thread. Recovery systems can produce subtle duplicates or stale state. Once continuity becomes infrastructure, it inherits all the failure modes of infrastructure.

The winners will be the teams that make this layer both durable and legible. Not just “we restore sessions,” but how they are restored, what was summarized, what was replayed, what state was preserved, and what the agent is assuming now.

That is where the repos are heading. The practical win is not more wow factor. It is fewer dropped threads. The most important agent UX improvements may come from quieter guarantees that a task can survive interruption without losing its plot.

Open question: as coding agents take on longer and more autonomous tasks, will users care more about raw context size, or about clear guarantees that the system can compress, replay, and recover work without breaking continuity?

Call to action: if you are evaluating agent tools, look past the model headline. Inspect the resume flows, summary layers, memory boundaries, and replay logic. That is where the next real moat may be forming.

Source anchors

google-gemini/gemini-cli — packages/core/src/services/agentHistoryProvider.ts, packages/core/src/config/defaultModelConfigs.ts, packages/core/src/utils/memoryDiscovery.ts, commits 320c8aba4ce1 and 4034c030e711
openai/codex — codex-rs/app-server-protocol/src/protocol/thread_history.rs, codex-rs/tui_app_server/src/lib.rs, commits b06f91c4fe52 and 8e24d5aaea1c
openclaw/openclaw — src/agents/subagent-orphan-recovery.ts, src/agents/cli-runner.ts, commits c780b6a6ab2a and 8edf2146ae59

AI Coding Agents Have a New Bottleneck: Keeping Work Intact

Gemini CLI starts treating history like an active subsystem

Codex is turning resume into protocol work

OpenClaw is building restart recovery into the runtime

The broader pattern: continuity is becoming a real differentiator

Why this matters for users

The tension: continuity can become bureaucracy

Source anchors

Receipts below the story

Source Trail

Evidence Limits

Send a note to the desk

AI Coding Agents Have a New Bottleneck: Keeping Work Intact

Gemini CLI starts treating history like an active subsystem

Codex is turning resume into protocol work

OpenClaw is building restart recovery into the runtime

The broader pattern: continuity is becoming a real differentiator

Why this matters for users

The tension: continuity can become bureaucracy

Source anchors

Receipts below the story

Source Trail

Evidence Limits

Atlas Context

The Invisible Prompt

Send a note to the desk

Same Edition

The Next Agent UX Moat Isn’t Speed. It’s Backpressure.