AI Coding Agents Are Turning Compaction Into a Policy Engine

Context windows keep getting marketed like warehouse space.

One million tokens. Bigger buffers. Longer sessions. Fewer resets.

But the code inside agent runtimes is telling a more interesting story: even in the age of huge windows, teams do not trust raw accumulation. They are building systems to trim, summarize, preserve, verify, and re-inject context with surprising care.

That is why compaction now looks less like janitorial cleanup and more like a policy engine: a subsystem with explicit rules for what gets kept, what gets collapsed, what gets reattached, and what gets checked before the session moves on.

Across Codex, Gemini CLI, and OpenClaw, the important question is no longer just whether old context gets compressed. It is how the runtime decides what must stay verbatim, what can be reduced to a summary, how tool output gets budgeted, which guardrails survive, and how the system checks that it did not quietly throw away the thing the user actually needed.

Codex treats compaction like live traffic management

OpenAI’s Codex makes the shift visible in codex-rs/core/src/compact.rs. The compaction path is not a simple “summarize previous messages” helper. It is woven into active turn execution.

The file defines a dedicated compaction prompt, a COMPACT_USER_MESSAGE_MAX_TOKENS cap of 20,000 tokens for preserved user messages, and a branch for mid-turn compaction that re-injects the session’s initial context before the last real user message. That placement detail matters. Codex is not merely shrinking the transcript. It is preserving a model-facing boundary so the summarized history still lands in the right conversational shape.

There is another tell in the retry loop. If the compaction prompt itself overruns the context window, Codex removes the oldest history item, counts how many items were trimmed, and tells the user that older thread items were removed so compaction could fit. In other words: compaction has become a managed overflow path, not a magical black box.

The runtime is not acting like context is infinite. It is acting like context is a budget under pressure, and compaction is the emergency valve that still needs rules.

The warning Codex emits at the end of compaction sharpens the product story even more: long threads and multiple compactions can make the model less accurate, so users should start a new thread when possible. That is unusually candid. It turns compaction from an invisible maintenance trick into a user-facing accuracy tradeoff.

Fresh issue traffic shows the same pressure moving upward into product semantics. Issue #16140 asks for model-aware context window and auto-compaction settings so large-window values do not silently become unsafe after switching to a smaller model. Issue #15889 describes compaction leaving Codex responding to an earlier request after the user has moved on. Issue #13279 is blunt enough to call it a compaction death spiral.

That is the key shift: once users can feel compaction failures directly, compaction stops being plumbing. It becomes behavior.

Gemini CLI turns compression into an explicit budgeting system

Google’s Gemini CLI approaches the same problem with even more visible policy knobs. In packages/core/src/services/chatCompressionService.ts, the runtime defines a default auto-compression threshold of 50% of the model limit, preserves roughly the newest 30% of the history, and gives function responses their own 50,000-token budget.

That last part is especially revealing. Gemini walks the history backward so the newest tool outputs keep their fidelity, then starts truncating older large tool responses once the function-response budget is exceeded. Those oversized outputs are not simply dropped. They are replaced with smaller placeholders and saved to temp files.

That is not generic summarization. It is context triage.

Gemini also makes a sharp distinction between what the summarizer wants and what the model can safely afford. If the original history chunk is still under the model limit, it sends the original version to the summarizer. If not, it falls back to a truncated form. Then it runs a second verification pass telling the model to critique its own <state_snapshot> and regenerate it if specific technical details, file paths, tool results, or user constraints were lost.

That second pass is the important clue. Gemini is not satisfied with “summary generated.” It wants a self-audit before the compressed state becomes the new context foundation.

Meanwhile, the operator surface is becoming explicit too. In packages/cli/src/ui/commands/compressCommand.ts, /compress is a built-in command with aliases for summarize and compact, and the UI blocks duplicate compression requests with a pending-state guard. The active issue backlog pushes in the same direction: #21892 asks for guided compression so a user can say, in effect, “compress this, but keep the exact SQL query we just wrote.”

That is what a control plane looks like: threshold policy, preservation rules, verification logic, and operator intent all meeting in one subsystem.

OpenClaw is adding safeguards, audits, and memory boundaries around compaction

OpenClaw pushes the pattern further by treating compaction as something that can fail in subtle, structural ways.

In src/agents/compaction.ts, the runtime uses adaptive chunk ratios, safety margins, staged summarization, and fallback paths for oversized messages. It also strips verbose tool-result details before summarization, preserves opaque identifiers like hashes and URLs exactly, and repairs tool-use/tool-result pairing after older chunks are dropped so strict providers do not choke on orphaned tool results.

That is already richer than a typical “conversation summary” implementation. But the sharper signal is in src/agents/pi-extensions/compaction-safeguard.ts.

There, OpenClaw requires summaries to contain exact sections for decisions, open TODOs, constraints, pending user asks, and exact identifiers. It can preserve recent turns verbatim, audit whether the latest user ask still overlaps with the summary, check whether identifiers were preserved, and retry the summary when quality checks fail. If new content has crowded out too much history, it can summarize dropped chunks separately and feed that result back as prior summary context.

In plain English: OpenClaw is not just compressing. It is governing compression.

The adjacent src/auto-reply/reply/post-compaction-context.ts makes the broader design philosophy clear. After compaction, OpenClaw can re-inject critical AGENTS.md sections like startup rules and safety boundaries so the summary does not quietly erase the operating instructions the agent is supposed to keep following. That is a subtle but important idea: some context should never be left to summarization quality alone.

Even the maintenance layer reflects the same concern. A fresh OpenClaw commit truncates session JSONL after compaction to stop transcript growth from becoming unbounded. So the compaction story here is not just about model tokens. It is about the whole runtime envelope.

The broader pattern: context is becoming governed, not just enlarged

Put the repos side by side and the convergence is hard to ignore.

Trigger: Codex and Gemini both expose explicit compact/compress paths instead of treating summarization as invisible maintenance.
Preserve: Codex keeps recent user intent bounded, Gemini preserves the newest slice of history and freshest tool output, OpenClaw can preserve recent turns verbatim.
Compress: all three reduce old history, but with different budgets, split logic, and fallback paths.
Verify: Gemini runs a second-pass self-critique; OpenClaw audits sections, identifiers, and latest-ask overlap; Codex warns when repeated compaction can degrade accuracy.
Reinject: Codex repositions initial context, OpenClaw reattaches critical startup and safety rules, Gemini rebuilds a fresh state snapshot into the live history.
Expose: user-facing commands, warnings, and issue traffic show compaction is now part of the operator experience.

Different architectures. Same realization.

The hard problem is no longer merely “How big is the context window?” It is “Who controls the loss function when context has to shrink?”

Which details stay exact? Which tool outputs get demoted to placeholders? Which recent turns stay verbatim? Which safety rules get reattached afterward? Which summaries are trusted enough to become the new memory floor?

Those are no longer edge-case implementation details. They are product choices with user-visible consequences.

Why this matters now

Big context windows are real progress. But they do not remove the need for compaction. They raise the stakes of compaction.

Once an agent can run longer, touch more files, call more tools, and accumulate more hidden state, the eventual compression step matters more, not less. If that step is sloppy, the system forgets the wrong thing. If it is too aggressive, it loses precision. If it is too opaque, users stop trusting long sessions altogether.

That is why compaction is graduating into a competitive layer. The winning systems may not be the ones with the single biggest window. They may be the ones with the clearest compaction policy: what gets preserved, what gets summarized, what gets verified, and what the user is told when fidelity drops.

Open question: as agent sessions get longer and context windows get larger, will users care more about maximum capacity—or about which runtime gives them the clearest guarantees for what survives compaction and why?

Call to action: if you are building or evaluating agent tools, inspect the compaction path like you would inspect a database transaction layer. Look for budgets, safeguards, verification, operator controls, and preserved boundaries. In long-running agent work, compaction policy may end up mattering almost as much as model choice.

Source anchors

openai/codex — codex-rs/core/src/compact.rs; gsio commit anchors 9a948836bf00, 2621ba17e3d1, 50a77dc138f3; GitHub issues #16140, #15889, #13279
google-gemini/gemini-cli — packages/core/src/services/chatCompressionService.ts, packages/cli/src/ui/commands/compressCommand.ts; gsio commit anchors 6ae75c9f32a9, ffcd9963667f, e59c872b3dea; GitHub issue #21892
openclaw/openclaw — src/agents/compaction.ts, src/agents/pi-extensions/compaction-safeguard.ts, src/auto-reply/reply/post-compaction-context.ts; gsio commit anchors 89c4c674d178, 3fdd7c9e0047, e0dfc776bba8; local git commit c6968c39d6

AI Coding Agents Are Turning Compaction Into a Policy Engine

Codex treats compaction like live traffic management

Gemini CLI turns compression into an explicit budgeting system

OpenClaw is adding safeguards, audits, and memory boundaries around compaction

The broader pattern: context is becoming governed, not just enlarged

Why this matters now

Source anchors

Receipts below the story

Source Trail

Evidence Limits

Send a note to the desk

AI Coding Agents Are Turning Compaction Into a Policy Engine

Codex treats compaction like live traffic management

Gemini CLI turns compression into an explicit budgeting system

OpenClaw is adding safeguards, audits, and memory boundaries around compaction

The broader pattern: context is becoming governed, not just enlarged

Why this matters now

Source anchors

Receipts below the story

Source Trail

Evidence Limits

Atlas Context

Memory Is a Typed Storage Contract

Send a note to the desk

Same Edition

From Tool Chatter to Chapters: Agent CLIs Are Inventing a Narrative Layer