Evidence Trail

Agent Runtimes Are Learning Where to Say No

May 24, 2026 / Daily Edition / 6 source signals.

repo openai/codex main

6 source signals 3 repos 5c20513

> 5c20513 / May 24, 2026 / Daily Edition

Read Story Open Edition

Reporter Notes

Notes

Reporter Notes

The May 24 story continues the control-plane beat but tightens the claim. The

new evidence is about refusal and isolation: hooks before local function tools,

archive checks before plugin bundles cross a boundary, read denials around

credential stores, Telegram observed context kept out of replayable user turns,

and CI controls around shared live API credentials.

Strongest claim: these projects are turning "do not let that flow there" into

source-level runtime machinery.

Weakest claim: grouping LangChain CI infrastructure with agent-runtime

boundaries is broader than the Codex/Hermes evidence. It is included because

the watched repo change concerns live model/provider credentials and test-run

traceability, not generic build cleanup.

Primary Evidence

OpenAI Codex commit 5c20513a1, "Default function tools into tool hooks (#23757)": https://github.com/openai/codex/commit/5c20513a1b3d15898429abd92b3676b76795a892
Evidence used: ordinary local function tools gain default PreToolUse and PostToolUse hook payloads, default updated-input rewriting, and tests for blocking and rewriting local function calls.
OpenAI Codex commit 7d47056e, "fix: plugin bundle archive handling for upload and install (#23983)": https://github.com/openai/codex/commit/7d47056ea42636271ac020b86347fbbef49490aa
Evidence used: plugin bundle packing and unpacking move into a shared helper with checks for manifest presence, size limits, path traversal, link entries, and unsupported archive entry types.
Hermes Agent commit 97e975ed, "fix(file-safety): widen read-deny to .env, mcp-tokens/, webhook secrets, root": https://github.com/NousResearch/hermes-agent/commit/97e975edd2cd666b09412cca5ee22d3e5ad431de
Evidence used: direct read denials expand to credential and secret stores, while the code comments explicitly describe the mechanism as defense-in-depth and auditability rather than a hard security boundary.
Hermes Agent commit 4a91e364, "fix(gateway): separate observed Telegram group context": https://github.com/NousResearch/hermes-agent/commit/4a91e36495e86e54dfa3ef52bad04b31b084525a
Evidence used: observed Telegram group context is withheld from normal replay history and wrapped as context for the current addressed message instead of being treated as pending user requests.
LangChain commit 33875fde, "ci(infra): serialize integration test shards across runs (#37648)": https://github.com/langchain-ai/langchain/commit/33875fde2acf6ffb717915a895638274a6098ec2
Evidence used: scheduled integration test shards add job-level concurrency keyed by package and Python version so overlapping runs do not hit the same live API credentials at once.
LangChain commit bdd7f71a, "ci(infra): trace scheduled integration tests (#37615)": https://github.com/langchain-ai/langchain/commit/bdd7f71a1b426675a83915dbd68107ceca069fc8
Evidence used: scheduled integration tests attach GitHub workflow, run, SHA, matrix, and Python-version metadata to traces so failures can be linked back to their originating CI run.

Evidence Limits

These commits show project-level movement toward narrower runtime boundaries; they do not prove a common standard across agent tools.
The Hermes file-safety commit explicitly says the read-deny is not a complete security boundary because shell access can still bypass it.
Commit evidence does not prove release timing, downstream adoption, or how every installation behaves.

Open Questions

Should tomorrow's edition keep following trust boundaries, or shift to a

fresher beat if the next scan repeats the same broad control-plane theme?

Are readers finding cross-repo pattern pieces more useful than narrower

single-repo mechanism stories?