Agent Runtimes Are Learning to Audit Their Own Tools

Fresh Codex and Gemini CLI changes show agent projects treating tool calls, plugins, MCP servers, and subagents as auditable runtime events instead of invisible helper work.

repos openai/codex + google-gemini/gemini-cli evidence

8 source signals 2 repos 5 linked commits

Evidence: 5 linked commits / May 19, 2026 / Daily Edition

The next serious feature in coding agents may not look like a new button. It may look like a receipt.

In the last week of watched changes, OpenAI Codex and Google Gemini CLI both moved in that direction. Codex added a typed lifecycle interface so extensions can observe when host-owned tools start and finish. Gemini CLI added a session-backed path for local subagents, plus a flag to route subagent invocations through the AgentSession protocol. A nearby Gemini documentation change spells out how sensitive host environment variables are redacted before MCP servers see them.

The pattern is bigger than any one API name. Agent runtimes are starting to answer operational questions that early demos could leave vague: who supplied this tool, what invoked it, did it finish or fail, did a user cancel it, and which secrets crossed the boundary?

Codex makes tool execution observable

The clearest Codex change is PR #23309, committed as c69cde3, which adds a ToolLifecycleContributor trait to the extension API. The new contract gives contributors on_tool_start and on_tool_finish callbacks. The companion tool_lifecycle.rs file defines host-visible inputs such as turn_id, call_id, tool_name, and a source enum that distinguishes direct model tool calls from code-mode nested tool calls.

The outcome enum is the tell. Codex now gives lifecycle observers typed endings: completed, blocked, failed, or aborted. That makes tool execution a runtime event with a beginning, an owner-visible source, and a classified ending.

Another Codex commit, a66e0e9, adds plugin provenance to MCP tool metadata. The test plugin_mcp_tool_call_request_meta_includes_plugin_id proves that plugin-backed MCP calls can carry a plugin id in request metadata. In a world where tools can come from built-ins, user MCP servers, plugins, or connectors, that tiny field matters. It lets the host say not only "a tool ran," but "this layer supplied it."

Gemini moves subagents into sessions

Gemini CLI's matching evidence is PR #26665, committed as 6973b96. It adds LocalSessionInvocation, a new local subagent invocation class built around LocalSubagentSession. The implementation publishes SUBAGENT_ACTIVITY events, streams thought and tool-call activity, tracks running, completed, error, and cancelled states, sanitizes displayed thought/tool/error content, and wires abort signals into the subagent session.

That is not just delegation. It is delegation with a transcript surface. The tests cover thought streaming, tool start and end handling, rejected tool calls, cancellation, config updates before query messages, and cleanup of observers and abort listeners.

PR #26947 makes the direction explicit by adding experimental.adk.agentSessionSubagentEnabled. Gemini's configuration docs describe the flag as routing subagent invocations through the AgentSession protocol instead of legacy executors. That is a migration sentence disguised as a settings entry.

MCP is becoming a boundary, too

The security side appears in Gemini CLI PR #22854. The docs now clarify that the CLI redacts sensitive host environment variables before passing environment data to third-party MCP servers, and that explicit env entries in settings.json or mcp_config.json are the trusted path when a server really needs a token.

This is the same story from another angle. MCP servers are useful because they widen what an agent can do. They are risky for the same reason. The runtime has to decide what crosses from the user's machine into that server process, and the docs now make that boundary legible.

What changed in the story

Early agent coverage often asked whether a system had tools, memory, or subagents. The fresher question is whether the runtime can govern those pieces once they exist. Codex is adding lifecycle and provenance hooks around tool calls. Gemini is moving subagents into a session protocol with activity and cancellation semantics. Gemini's MCP docs are making environment handling explicit.

That does not prove a shared standard is forming. It does show convergent pressure. As coding agents become less like single prompts and more like operating environments, tool execution needs audit trails, not just output text.

Watch the next round of changes for three things: whether lifecycle hooks become public extension contracts, whether session-backed subagents become the default path, and whether MCP provenance becomes visible enough for users to understand which tool actually touched their work.

Primary Evidence

OpenAI Codex commit c69cde3547c87c3423434ff37273dcadbcce8817, Add tool lifecycle extension contributor (#23309): https://github.com/openai/codex/commit/c69cde3547c87c3423434ff37273dcadbcce8817

Shows: adds ToolLifecycleContributor, typed tool start and finish callbacks, source metadata for direct and code-mode calls, and outcomes for completed, blocked, failed, and aborted tool calls.

OpenAI Codex PR #23309, Add tool lifecycle extension contributor: https://github.com/openai/codex/pull/23309

Shows: PR framing that lifecycle contributors observe accepted tool starts and finishes while other runtime layers remain responsible for policy and execution.

OpenAI Codex commit a66e0e9c4b2978121ed1cd4242f7f62dd027423f, Include plugin id in plugin MCP tool metadata (#23353): https://github.com/openai/codex/commit/a66e0e9c4b2978121ed1cd4242f7f62dd027423f

Shows: plugin-backed MCP tool calls now carry plugin provenance in request metadata, with test coverage for MCP_TOOL_PLUGIN_ID_META_KEY.

Google Gemini CLI commit 6973b963aebf3397d53c8c42eb0357b2d9eb5edb, feat(core): add LocalSessionInvocation (#26665): https://github.com/google-gemini/gemini-cli/commit/6973b963aebf3397d53c8c42eb0357b2d9eb5edb

Shows: adds session-backed local subagent invocation, subagent activity events, sanitized activity display, result formatting, cancellation/error states, abort wiring, and parent observer cleanup tests.

Google Gemini CLI commit 5611ff40e7ace8157fbeee97d459178e988862f3, feat(core): add adk.agentSessionSubagentEnabled flag (#26947): https://github.com/google-gemini/gemini-cli/commit/5611ff40e7ace8157fbeee97d459178e988862f3

Shows: adds agentSessionSubagentEnabled configuration support and documents routing subagent invocations through the AgentSession protocol instead of legacy executors.

Google Gemini CLI PR #26947, feat(core): add adk.agentSessionSubagentEnabled flag: https://github.com/google-gemini/gemini-cli/pull/26947

Shows: merged PR and changed files tying the new setting to ADK AgentSession-based subagent execution.

Google Gemini CLI commit 0c0d88d90b2878e34126737ef9218f9e7e3dec3e, docs(extensions): clarify env var sanitization policy for MCP and ext... (#22854): https://github.com/google-gemini/gemini-cli/commit/0c0d88d90b2878e34126737ef9218f9e7e3dec3e

Shows: clarifies that sensitive host environment variables are redacted before third-party MCP servers receive environment data, with explicit configuration as the trusted override path.

Google Gemini CLI MCP server documentation: https://github.com/google-gemini/gemini-cli/blob/main/docs/tools/mcp-server.md

Shows: public documentation for MCP server environment handling and sensitive variable redaction.

Evidence Limits

The article does not claim Codex and Gemini share an implementation, standard, or coordinated roadmap.

The article treats lifecycle contributors, plugin metadata, session-backed subagents, and MCP environment redaction as evidence of convergent runtime-governance pressure, not as proof of feature parity.

The Gemini AgentSession subagent setting is explicit rollout plumbing; this evidence does not prove that AgentSession-backed subagents are the default path for all users.

Commit and PR evidence can show what changed in source, but it does not prove release timing, downstream adoption, or how every end-user installation behaves.

Agent Runtimes Are Learning to Audit Their Own Tools

Codex makes tool execution observable

Gemini moves subagents into sessions

MCP is becoming a boundary, too

What changed in the story

Receipts below the story

Primary Evidence

Evidence Limits

Send a note to the desk

Agent Runtimes Are Learning to Audit Their Own Tools

Codex makes tool execution observable

Gemini moves subagents into sessions

MCP is becoming a boundary, too

What changed in the story

Receipts below the story

Primary Evidence

Evidence Limits

Atlas Context

Tool Calls: Contract, Authorization, Execution

Send a note to the desk