← Series bible & episode map Chapter 1 v2 · original draft preserved

How AI Agents Work

Chapter 1 — Birth of an Agent

Most people think AI agents feel different because the models inside them are different. The code tells a messier, more useful story: what users experience as an agent is usually not just a model, but a model embedded in a harness.

Most people have a simple theory for why AI agents feel different.

Codex feels one way because OpenAI’s models are one way. Gemini CLI feels another because Gemini models are another. Swap the model, swap the experience. Better model, better agent.

That theory is not wrong. Models matter enormously. They shape what is possible. They shape fluency, reasoning style, speed, limits, and raw capability.

But once you start reading the source code of real agent systems, that explanation stops being enough.

Because what users meet in practice is almost never the model alone. They meet the model wrapped in software: a harness that decides what it can see, what tools it can touch, what it remembers, how it is routed, what permissions it gets, whether it can delegate, how it asks for approval, and how it returns to the human.

The model is raw capability. The harness turns that capability into conduct.

This is why two agents built on similarly powerful frontier models can still feel radically different in trust, reliability, autonomy, and style. Users often think they are comparing models. In practice, they are often comparing systems.

That is the real starting point for this series.

And that is why the title of this chapter still matters. The birth of an agent is the moment this harness assembles around raw capability and turns it into a bounded actor.

What gets assembled at birth

That assembly can take many forms, but the ingredients repeat across codebases.

At minimum, an agent has to be given some combination of identity, scope, context, permissions, execution environment, and a route back to the human. Without those, a model may be powerful, but it is not yet an operational agent.

This matters because these are not cosmetic choices. They shape the lived experience of the product. They influence whether the system feels cautious or reckless, narrow or expansive, situated or abstract, trustworthy or uncanny.

In other words: architecture is where intelligence becomes behavior.

OpenClaw
The harness begins with routing and session identity. A message becomes a scoped session, a queue lane, a run, and a workspace.

Codex
The harness begins with roles and orchestration. An agent is born into a collaborative graph with inherited constraints.

Gemini CLI
The harness begins with definitions and execution context. A declared agent is instantiated inside a prepared runtime.

Mistral Vibe
The harness begins with bootstrap and profile. Startup, onboarding, mode, and agent posture are made visible from the first interaction.

OpenClaw: the harness is a live operating context

OpenClaw makes the point vividly because its architecture exposes the operational scaffolding in plain sight. An inbound message is not treated as just text to feed into a model. It is routed. Bound. Scoped.

The session system decides whether that work belongs to a direct chat, a group thread, a cron job, or another run type. The docs and runtime artifacts show the importance of session keys, routing, queueing, and serialization. Before the model answers anything, the system already knows where this work lives and how it is allowed to proceed.

That has a very direct human consequence: OpenClaw agents feel situated. They do not feel like floating intelligence detached from environment. They feel like participants operating in a specific channel, thread, workspace, and permission boundary.

This is the first big lesson of the harness idea. Identity is not only personality. It is also infrastructure.

Codex: the harness is organizational

Codex shows a different first commitment. Its evolution points strongly toward roles, inherited configuration, sub-agent spawning, persistent threads, and collaborative orchestration. In the code history, agents increasingly look like workers in a structured system rather than solitary responders.

That means the harness here is not primarily about routing a message into a session. It is about deciding what role a new worker will play, what context it inherits, and how it fits into an existing coordination graph.

The human consequence is important. This makes Codex-style agents feel specialized. They can feel less like “one assistant with a lot of tricks” and more like a small organization that knows how to divide labor.

Users often experience that difference as intelligence. But the source suggests something more precise: part of what they are feeling is orchestration design.

Gemini CLI: the harness is declarative

Gemini CLI pushes the story in another direction. Here, the important moves are agent definitions, startup context generation, session binding, policy attachment, and execution-scoped context. The system spends effort deciding what an agent is before that agent starts moving.

That gives Gemini’s harness a more declarative flavor. An agent is not merely improvised from a prompt. It is instantiated from a defined structure with a prepared context and an execution model around it.

That matters to users because it changes how the system feels. A declarative harness tends to feel governed. Discoverable. Composed. Legible. It becomes easier to understand why one agent exists, why another is disabled, how one gets attached to policy, and how one execution context differs from another.

Again, the user may describe that as product quality or model behavior. But underneath, much of it is architecture doing the work.

Mistral Vibe: the harness is productized

Mistral Vibe adds another important variation. In its source, startup is explicit and highly visible: bootstrap config files, run onboarding if setup is missing, choose an initial agent profile, resume a session if one exists, then enter either programmatic or interactive mode.

That means the harness is not hidden deep in runtime machinery. It is part of the product ceremony. The system teaches both the software and the user what kind of collaboration is about to happen.

The built-in profiles matter here. They are not just presets. They encode posture. One mode is cautious, another exploratory, another editing-forward, another more permissive. So the birth of the agent is also the moment the product reveals its stance toward action, trust, and control.

The human consequence is immediate: the agent feels less like an opaque intelligence and more like a collaborator with an explicit operating mode. That clarity is part of the harness too.

Why users should care

This is not just an architectural curiosity for developers. It affects what people actually live through when they work with agent products.

The harness shapes whether an agent: remembers carefully or forgets abruptly; asks permission or acts freely; feels grounded in your project or strangely detached; behaves like a solo helper or a coordinated team; appears trustworthy, flaky, cautious, or bold.

And this is exactly why model-only explanations keep failing in practice. Two products can sit on roughly comparable model power and still feel nothing alike, because the user is interacting with a whole system of decisions around that model.

Users do not meet the model naked. They meet it in a suit.

That suit includes routing, context, permissions, memory, tools, startup posture, and orchestration. That suit is what turns possibility into experience.

Birth is the first visible edge of a larger anatomy

So the real lesson of this chapter is not merely that agents begin before the prompt. It is that an agent is best understood as a model embedded in a harness, and birth is the moment that harness comes together.

That gives us a stronger way to read the rest of the series.

If the model is raw capability, and the harness turns that capability into behavior, then the next question becomes unavoidable:

What world does the harness allow the agent to perceive?

Because once an actor exists, what matters next is not only who it is, but what enters its field of view: files, memory, logs, branch state, skills, tools, threads, project documents, and live environmental signals.

The harness does not only shape the actor. It also shapes the world the actor is allowed to know.

That is where Chapter 2 begins.