The old mental model for AI coding agents was simple: give the model some tools, bolt on a sandbox, ask for confirmation before anything scary, and call it a day.
That model is already breaking.
Across several agent repos, the interesting change is not just expanded capability, but explicit runtime specialization. One mode researches. Another reviews. Another runs with the gloves off. Another quietly shifts its tool envelope depending on which model is behind the wheel.
In other words: what looked like an approval toggle is starting to behave more like an operating-mode control plane.
Gemini CLI makes planning a first-class state
Google’s Gemini CLI shows the clearest example. In docs/cli/plan-mode.md, Plan Mode is described as a read-only environment for research, design, and execution planning before implementation. The key detail is what happens underneath that label: the mode sharply narrows the allowed tool surface.
The same doc limits Plan Mode to read/search/interaction tools, plus tightly scoped writes only for Markdown plan files in the plans directory. That is not a cosmetic UX flourish. That is a separate behavioral envelope.
Then the configuration layer makes the shift explicit. In packages/cli/src/config/settingsSchema.ts, Gemini exposes defaultApprovalMode values including default, auto_edit, and plan, with plan defined as read-only mode. A setting that once sounded like “how often should I ask permission?” now decides what kind of agent you are talking to.
Codex turns review into a protocol, not a prompt trick
OpenAI’s Codex takes the same idea in a different direction. In codex-rs/protocol/src/protocol.rs, review is a first-class operation: Op::Review { review_request: ReviewRequest }. The protocol also adds EnteredReviewMode and ExitedReviewMode events, which means review is not just “the model was asked nicely to critique a patch.” It is a runtime state the system knows how to enter, track, and leave.
The payload shape drives that point home. ReviewOutputEvent includes structured findings, an overall_correctness verdict, an explanation, and a confidence score. Meanwhile codex-rs/core/review_prompt.md gives the agent a dedicated reviewer identity and a strict JSON schema for output.
That is a big product-level statement. Codex is not merely changing how assertive the model should be. It is creating a specialist mode with separate history, separate instructions, and a distinct output contract optimized for code review.
Crush makes trust posture a mode switch
Charm’s Crush looks simpler on the surface, but it points in the same direction. In commit 2d332fc6c164, Crush adds a --yolo flag for auto-accepting all permissions. In internal/config/config.go, that becomes SkipPermissionsRequests with the comment “Automatically accept all permissions (YOLO mode).” In internal/permission/permission.go, permission requests return true immediately when that skip path or a session auto-approve path is active.
That might sound like a convenience flag. But the effect is architectural: it swaps the runtime from negotiated execution into pre-authorized execution. The agent’s trust posture changes globally. That is mode-like behavior, even if it arrives through a single command-line flag rather than a fully separate prompt or protocol state.
OpenClaw points toward the next step: invisible model-specific policy layers
OpenClaw pushes the idea one layer deeper. Its src/agents/pi-tools.policy.ts resolves tool policy by modelProvider and modelId, then composes global, agent-level, and provider-level policies into one effective envelope. src/agents/pi-tools.ts passes those model identifiers straight into coding-tool creation.
The implication is that the same agent may expose different tool capabilities depending on which model is active, even without presenting that shift as an explicit user-facing mode. Not all runtime states will be branded with friendly names like Plan or Review. Some will look more like hidden policy layers: Anthropic-shaped behavior, Codex-shaped behavior, Gemini-safe behavior, provider-specific tool profiles, and so on.
That is where this trend gets really interesting. Once tool policy starts becoming model policy, the “mode switch” stops being only a visible button and starts becoming part of the runtime fabric.
Why this pattern is spreading now
The market context fits. OpenAI’s Codex launch page leans hard on isolated environments, verifiable evidence, terminal logs, and human review before integration. That language is not about maximum autonomy. It is about controllable autonomy. Meanwhile Gemini CLI and Crush both present themselves as general-purpose terminal agents for real developer workflows, not toy demos. As these tools move from novelty to daily driver, one giant agent personality is not enough.
Developers want different gears for different moments:
- a cautious planner that cannot quietly edit the repo,
- a specialized reviewer that speaks in findings and confidence scores,
- a fast executor for trusted automation,
- and model-aware runtime policies that stop weaker or less trusted setups from touching the sharpest tools.
That is the deeper convergence. Approval systems started as seatbelts. They are turning into transmission systems.
The tension: specialization vs flow
There is a tradeoff hiding inside this shift. Every new mode can make an agent safer, more legible, and easier to trust. It can also make the experience more fragmented. A future agent CLI with Plan, Code, Review, Fixup, Triage, Yolo, and provider-shaped submodes could become powerful but exhausting.
The winners will not just add modes. They will make mode transitions feel natural, keep state legible, and explain what changed: which tools disappeared, which prompt rules tightened, which outputs are now expected, and why.
Right now, the repos suggest that teams are still feeling their way toward that balance. But the direction is clear. The next generation of coding agents will be shaped not only by model quality, but by how cleanly they shift gears between planning, execution, and review.
Open question: if agent trust is increasingly managed through operating modes, will developers prefer a few explicit modes they can reason about, or a more adaptive system that changes policy invisibly based on task and model?
Call to action: if you are building or using one of these agents, inspect the mode layer closely. Look past the label in the UI and trace the actual prompt changes, tool restrictions, event schemas, and policy branches. That is where the real product strategy is showing up.
Source anchors
google-gemini/gemini-cli—docs/cli/plan-mode.md,packages/cli/src/config/settingsSchema.ts, commit35ee2a841a7eopenai/codex—codex-rs/protocol/src/protocol.rs,codex-rs/core/review_prompt.md, commit90a0fd342f5dcharmbracelet/crush—internal/config/config.go,internal/permission/permission.go, commit2d332fc6c164openclaw/openclaw—src/agents/pi-tools.policy.ts,src/agents/pi-tools.ts, commitfa8d9b91892f
Send a note to the desk
Corrections, missing context, or a follow-up lead.