AI agents are getting better at saying “here’s what I finished”

For a while, agent UX has had a brutal flaw. The system looks busy. Tools spin. Tokens burn. Then the run ends and the human gets one of two outcomes: a neat answer, or a blank wall.

That wall is starting to crack.

Fresh code in Codex, Gemini CLI, and OpenClaw points to the same shift: agent runtimes are getting much better at preserving continuity. Not just finishing tasks, but showing partial progress while the task is still in flight, or after it fails to land cleanly.

The story is no longer just “can the agent do the work?” It’s “can the agent leave you something useful when the work gets messy?”

Codex: make the live transcript a first-class event

In openai/codex, commit 3431f01 adds a typed realtime event: thread/realtime/transcriptUpdated. The payload is simple — { threadId, role, text } — but the implication is bigger than it looks.

Codex already had realtime surfaces for audio and raw items. What changed here is that transcript deltas now become their own explicit stream. In the event handling code, input transcript deltas are mapped to the user role, output transcript deltas to assistant, and the tests verify those deltas alongside forwarded handoff_request events that carry transcript context.

That means a client doesn’t have to wait for some final stitched artifact to understand what happened during delegation. It can render the conversation as it evolves.

Gemini CLI: show the subagent’s work, and stop charging the clock while humans think

Gemini CLI’s browser-agent changes are even more direct. In commit 759575faa80b, the browser agent starts emitting structured SubagentProgress objects with states like running, completed, error, and cancelled.

That progress object keeps a recentActivity log. It tracks thought chunks, tool-call starts and ends, associates tool work with a callId, and even sanitizes sensitive arguments and error payloads before surfacing them. The tests cover the whole arc: incremental thought text, tool lifecycle, failures, and secret redaction.

That alone would be notable. But Gemini’s follow-up move matters even more. In commit 11951592aaa0, the CLI introduces a DeadlineTimer that can pause() and resume(), then wires it into the scheduler so the agent’s timeout budget pauses while the runtime is waiting for user confirmation.

That’s a subtle but important philosophical change. It treats “the agent is blocked on the human” as different from “the agent is stuck.” In other words: don’t declare failure just because the person took ten seconds to decide.

OpenClaw: a timeout should still tell you what got done

OpenClaw pushes the same idea from the other end of the lifecycle: the timeout path.

In commit 598f1826d8b2, the subagent announce flow stops acting like the latest assistant message is the only thing that matters. Instead, it summarizes output history, counts assistant tool calls, preserves deliberate silence like NO_REPLY, and, when needed, emits a distilled fallback such as:

[Partial progress: N tool call(s) executed before timeout]

That line is more important than it looks. It converts a timeout from a dead end into an artifact. The user may not have the final polished answer, but they do have evidence that the run advanced, and sometimes a compact summary of where it got.

The bigger pattern

None of these repos are doing exactly the same thing. Codex is improving realtime transcript continuity. Gemini CLI is making subagent execution observable and fairer under confirmation waits. OpenClaw is preserving partial work when a subagent runs out of time.

But the pattern is clear.

Binary success/failure is giving way to incremental disclosure.
Delegation is becoming easier to inspect, not just initiate.
Timeouts are being treated as reportable states, not silent voids.
Agent UX is getting closer to how good human collaborators work: “I’m not done, but here’s what I’ve learned so far.”

That’s a bigger deal than it sounds. In real workflows, the most frustrating part of agent tooling is rarely raw model intelligence. It’s the loss of context between “started,” “working,” and “finished.” These commits all chip away at that gap.

Why this matters for builders

If you’re building on top of agents, this is a design hint worth stealing. Users don’t just need final answers. They need legible intermediate states: what was tried, what finished, what timed out, what’s waiting on them, and what can still be salvaged.

That’s where trust actually grows — not from perfect autonomy, but from visible progress under imperfect conditions.

So here’s the open question: if agent tools get much better at exposing partial progress, does that change how willing people are to hand them longer, riskier work?

If you’re building one of these systems, or using them in production, watch this layer closely. The next leap in agent UX may not be smarter answers. It may be better receipts.

AI agents are getting better at saying “here’s what I finished”

Codex: make the live transcript a first-class event

Gemini CLI: show the subagent’s work, and stop charging the clock while humans think

OpenClaw: a timeout should still tell you what got done

The bigger pattern

Why this matters for builders

Receipts below the story

Source Trail

Evidence Limits

Send a note to the desk

AI agents are getting better at saying “here’s what I finished”

Codex: make the live transcript a first-class event

Gemini CLI: show the subagent’s work, and stop charging the clock while humans think

OpenClaw: a timeout should still tell you what got done

The bigger pattern

Why this matters for builders

Receipts below the story

Source Trail

Evidence Limits

Atlas Context

The Invisible Prompt

Send a note to the desk