Agent Runtimes Are Learning Where to Say No
May 24, 2026 / Daily Edition / 6 source signals.
Reporter Notes
Notes
Reporter Notes
The May 24 story continues the control-plane beat but tightens the claim. The
new evidence is about refusal and isolation: hooks before local function tools,
archive checks before plugin bundles cross a boundary, read denials around
credential stores, Telegram observed context kept out of replayable user turns,
and CI controls around shared live API credentials.
Strongest claim: these projects are turning "do not let that flow there" into
source-level runtime machinery.
Weakest claim: grouping LangChain CI infrastructure with agent-runtime
boundaries is broader than the Codex/Hermes evidence. It is included because
the watched repo change concerns live model/provider credentials and test-run
traceability, not generic build cleanup.
Primary Evidence
- OpenAI Codex commit
5c20513a1, "Default function tools into tool hooks (#23757)": https://github.com/openai/codex/commit/5c20513a1b3d15898429abd92b3676b76795a892 - Evidence used: ordinary local function tools gain default PreToolUse and PostToolUse hook payloads, default updated-input rewriting, and tests for blocking and rewriting local function calls.
- OpenAI Codex commit
7d47056e, "fix: plugin bundle archive handling for upload and install (#23983)": https://github.com/openai/codex/commit/7d47056ea42636271ac020b86347fbbef49490aa - Evidence used: plugin bundle packing and unpacking move into a shared helper with checks for manifest presence, size limits, path traversal, link entries, and unsupported archive entry types.
- Hermes Agent commit
97e975ed, "fix(file-safety): widen read-deny to .env, mcp-tokens/, webhook secrets, root": https://github.com/NousResearch/hermes-agent/commit/97e975edd2cd666b09412cca5ee22d3e5ad431de - Evidence used: direct read denials expand to credential and secret stores, while the code comments explicitly describe the mechanism as defense-in-depth and auditability rather than a hard security boundary.
- Hermes Agent commit
4a91e364, "fix(gateway): separate observed Telegram group context": https://github.com/NousResearch/hermes-agent/commit/4a91e36495e86e54dfa3ef52bad04b31b084525a - Evidence used: observed Telegram group context is withheld from normal replay history and wrapped as context for the current addressed message instead of being treated as pending user requests.
- LangChain commit
33875fde, "ci(infra): serialize integration test shards across runs (#37648)": https://github.com/langchain-ai/langchain/commit/33875fde2acf6ffb717915a895638274a6098ec2 - Evidence used: scheduled integration test shards add job-level concurrency keyed by package and Python version so overlapping runs do not hit the same live API credentials at once.
- LangChain commit
bdd7f71a, "ci(infra): trace scheduled integration tests (#37615)": https://github.com/langchain-ai/langchain/commit/bdd7f71a1b426675a83915dbd68107ceca069fc8 - Evidence used: scheduled integration tests attach GitHub workflow, run, SHA, matrix, and Python-version metadata to traces so failures can be linked back to their originating CI run.
Evidence Limits
- These commits show project-level movement toward narrower runtime boundaries; they do not prove a common standard across agent tools.
- The Hermes file-safety commit explicitly says the read-deny is not a complete security boundary because shell access can still bypass it.
- Commit evidence does not prove release timing, downstream adoption, or how every installation behaves.
Open Questions
- Should tomorrow's edition keep following trust boundaries, or shift to a
fresher beat if the next scan repeats the same broad control-plane theme?
- Are readers finding cross-repo pattern pieces more useful than narrower
single-repo mechanism stories?