Implementation Delta

The patterns in this guide were derived from theory: studying what platforms ship, extracting the common architecture, and naming the recurring structures. But building an agent harness reveals patterns that theory misses — patterns that only surface when you write the code, handle the edge cases, and watch things break at runtime.

This page documents ten patterns discovered by analyzing real open-source agent harness implementations (notably claw-code, a 30K+ LOC Rust reimplementation of a production agent harness). Each pattern is load-bearing in practice but absent from the existing taxonomy.

1. Prompt Compilation

The pattern: The system prompt is not a file. It is a runtime artifact assembled from a dependency graph of inputs — project context files, tool descriptions, permission state, plugin manifests, session history, hook declarations, and MCP server capabilities.

Why theory misses it: The Instruction Files pattern describes static markdown loaded at session start. In practice, the instruction file is one input to a build process that produces the final prompt.

What it looks like in code:

prompt = []
prompt += load_instruction_files(walk_up_tree(cwd))
prompt += render_tool_descriptions(active_tools)
prompt += render_permission_context(current_mode)
prompt += render_mcp_capabilities(connected_servers)
prompt += render_plugin_manifests(loaded_plugins)
prompt += render_session_summary(compacted_history)
prompt += render_hook_declarations(active_hooks)

Why it matters: Getting prompt compilation wrong — wrong ordering, missing context, exceeding the token budget — silently degrades every downstream behavior. The model doesn’t error; it just gets worse. This makes prompt compilation the most consequential code in the harness, yet it has no dedicated pattern.

Design guidance:

Treat prompt assembly as a build pipeline with explicit phases: discover, resolve, budget, emit.
Assign token budgets per phase. If tool descriptions consume 40% of the instruction budget, you have a tool sprawl problem, not a context problem.
Cache the stable portions (tool schemas rarely change mid-session) and recompile only the dynamic portions (conversation summary, permission state).
Log the compiled prompt hash per turn so you can correlate behavior changes to prompt changes during debugging.

2. Streaming as an Architectural Constraint

The pattern: Streaming (SSE/WebSocket) is not a UX feature layered on top. It is a structural constraint that changes error handling, cancellation, rendering, tool execution, and backpressure throughout the entire stack.

Why theory misses it: The existing patterns assume request-response semantics. Real harnesses stream tokens as they arrive, which means every component — from the API client to the terminal renderer — must handle partial, in-flight state.

What it forces you to solve:

Problem	Non-streaming	Streaming
Error handling	Check response status	Stream can break mid-token, mid-tool-call, or mid-markdown block
Cancellation	Don’t send the request	User hits Ctrl+C during a file write — must abort cleanly, discard partial output, and leave the conversation in a valid state
Tool calls	Parse complete JSON	Tool call JSON arrives incrementally — must buffer, detect boundaries, and dispatch
Rendering	Render complete markdown	Render partial markdown that may have unclosed blocks, incomplete tables, or half-written code fences
Backpressure	N/A	Model produces tokens faster than the terminal renders — must buffer without unbounded memory growth

Design guidance:

Define a StreamEvent enum that every layer speaks: TokenDelta, ToolCallStart, ToolCallDelta, ToolCallEnd, Error, Done.
Make cancellation a first-class event, not an afterthought. Every async operation must respond to a cancellation signal within one tick.
Test with slow terminals. If your rendering can’t keep up with the model, you’ll discover backpressure bugs in production.

3. Session Persistence and Resumption

The pattern: Users stop working and come back. The agent must serialize its full conversation state to disk and rehydrate it later, handling the reality that the world changed in between.

Why theory misses it: Context Management addresses within-session concerns (compaction, budgeting). It says nothing about the across-session lifecycle: save, quit, resume, and the stale-context problem that follows.

What goes wrong on resume:

The codebase changed (someone else pushed commits). The agent’s compacted history references files that no longer exist or have different content.
Token archaeology: the resumed session carries summarized history from compaction. The agent “remembers” decisions it can’t fully reconstruct.
OAuth tokens expired during the gap. The first API call after resume fails with a 401 that the agent doesn’t expect.

Design guidance:

Serialize conversation state as a checkpoint: messages, tool results, compaction summaries, and a manifest of referenced files with their hashes.
On resume, diff the file manifest against current state. Surface changes to the user: “3 files referenced in this session have changed since you last worked.”
Refresh credentials before replaying the first message, not after the first failure.
Consider a “soft resume” that loads the checkpoint summary as context for a new session, rather than replaying the full history. This avoids stale-context problems at the cost of losing fine-grained history.

4. Tool Failure and Recovery

The pattern: Tools fail. bash returns exit code 1. File writes fail because the path doesn’t exist. MCP servers crash mid-call. The harness must treat tool failures as structured data the model reasons about, not exceptions that crash the loop.

Why theory misses it: The Lifecycle Hooks pattern covers pre/post-tool automation. The Tool Protocols pattern covers tool discovery and invocation. Neither addresses what happens when a tool call fails at runtime.

The failure taxonomy:

Failure type	Example	Correct handling
Expected error	`bash` exits with code 1	Return stderr as tool result. Model adapts.
Transient failure	Network timeout on `web_fetch`	Retry with backoff. Include attempt count in result.
Permanent failure	MCP server process died	Mark server as unavailable. Remove its tools from the active registry. Inform the model.
Partial result	Streaming tool output cut short	Return what was received with a truncation marker.
Permission denial	User rejected the tool call	Return denial as a structured result. Model must not retry the same call.

Design guidance:

Every tool result should be an envelope: { status: "ok" | "error" | "denied" | "timeout", content: ..., metadata: { duration_ms, exit_code, ... } }.
Never let a tool failure crash the conversation loop. The model is remarkably good at recovering from errors — if you give it the error message.
Track consecutive failures per tool. If bash fails 3 times in a row, the model is likely stuck in a retry loop. Surface this to the user rather than burning tokens.

5. Credential Lifecycle Management

The pattern: The agent needs API keys, OAuth tokens, and service credentials. These expire, need rotation, and must be stored securely. The harness must manage the full credential lifecycle: acquire, store, refresh, and revoke.

Why theory misses it: Sandboxing & Permissions discusses the two-phase runtime (secrets available during setup, removed during execution). It doesn’t address how the harness itself authenticates to the services it depends on — the Anthropic API, MCP servers, OAuth providers.

What a real implementation requires:

OAuth with PKCE: Browser-based login flow, authorization code exchange, token storage, and silent refresh.
Token storage: Credentials persisted to ~/.agent/oauth/ (or platform equivalent) with appropriate filesystem permissions.
Refresh before expiry: Proactive token refresh, not reactive 401 handling. A failed refresh mid-conversation is a terrible user experience.
Credential isolation: Each MCP server may need its own credentials. The harness must manage a credential store indexed by server identity.
Logout/revoke: Users must be able to explicitly clear stored credentials.

Design guidance:

Implement credential refresh as a background task that runs before each API call, not as error handling after a 401.
Store tokens with creation timestamp and TTL. Refresh at 80% of TTL, not on expiry.
Separate credential storage from configuration. Credentials are secrets; config is not. They have different security requirements and different lifecycles.

6. Project Context Discovery

The pattern: The agent doesn’t just load a file at a fixed path. It walks up the directory tree from the current working directory, discovers all relevant context files, and merges them by proximity (closer files win).

Why theory misses it: Instruction Files describes a four-layer precedence model (managed > user > project > local). In practice, the “project” layer is itself hierarchical — a monorepo may have a root CLAUDE.md and subdirectory-specific overrides, and the agent must discover and merge all of them.

The discovery algorithm:

context_files = []
dir = cwd
while dir != filesystem_root:
    for pattern in ["CLAUDE.md", ".claude/settings.json", ...]:
        if exists(dir / pattern):
            context_files.prepend(dir / pattern)
    dir = parent(dir)

Files closer to cwd override files further up. This handles:

Monorepos: /repo/CLAUDE.md sets global conventions; /repo/services/api/CLAUDE.md adds API-specific rules.
Nested projects: A git submodule with its own context file.
Subdirectory invocation: The user runs the agent from src/components/ and still gets the root project context.

Design guidance:

Walk up, not down. Searching downward is unbounded and slow. Searching upward is O(depth) and hits the root quickly.
Stop at the filesystem root or the first .git boundary (configurable).
Merge by key, not by concatenation. If the root says "permissionMode": "ask" and the subdirectory says "permissionMode": "dontAsk", the subdirectory wins — don’t append both.

7. LSP as a Semantic Tool Layer

The pattern: Language Server Protocol gives the agent structured code understanding — go-to-definition, find-references, type checking, diagnostics — that is orders of magnitude more reliable than text search.

Why theory misses it: Tool Protocols documents MCP, A2A, and WebMCP. LSP is absent from the protocol stack despite being a mature, widely-deployed standard that every major language supports.

What LSP provides that text tools don’t:

Capability	Text tools (grep, glob)	LSP
”What calls this function?”	Regex search, high false-positive rate	`textDocument/references` — precise, semantic
”What type does this return?”	Heuristic parsing	`textDocument/hover` — compiler-accurate
”Is this code valid?”	Run the compiler (slow, noisy)	`textDocument/diagnostic` — incremental, real-time
”Rename this symbol everywhere”	Find-and-replace (breaks strings, comments)	`textDocument/rename` — semantic, safe

Design guidance:

Expose LSP capabilities as tools in the agent’s tool registry: lsp_references, lsp_hover, lsp_diagnostics, lsp_rename.
Start the language server lazily on first use, not at session start. Most conversations don’t need it.
LSP servers are stateful and memory-intensive. Kill them when the session ends or when context compaction removes the files they were tracking.
Fall back gracefully. If no LSP server is available for the current language, the agent should use text tools without error.

8. Editor and IDE Compatibility

The pattern: The same agent core must work as a CLI REPL, a VS Code extension, a JetBrains plugin, and a web interface. The harness must separate the core (conversation loop, tools, permissions) from the interface (terminal rendering, JSON-RPC, HTTP).

Why theory misses it: The existing patterns describe what the agent does, not how it presents itself. In practice, the interface layer is a major engineering surface with its own constraints.

The compatibility matrix:

Interface	Transport	Input	Output	Constraints
CLI REPL	stdin/stdout	Line editing (rustyline)	Streaming markdown + syntax highlighting	Terminal width, color support, signal handling
VS Code	JSON-RPC over stdio	Extension API messages	Webview panels, editor decorations	Extension host lifecycle, webview security
JetBrains	HTTP/WebSocket	Plugin API messages	Tool windows, editor annotations	JVM process model, Kotlin/Java interop
Web	HTTP/SSE	REST API	JSON events	CORS, authentication, no filesystem access

Design guidance:

Extract the core as a library crate/package. The CLI, extension, and web server are thin shells that translate between their interface and the core API.
Define a HarnessEvent stream that all interfaces consume: MessageStart, TokenDelta, ToolCallRequest, ToolCallResult, PermissionPrompt, SessionEnd.
The CLI renders PermissionPrompt as an interactive terminal prompt. The IDE renders it as a dialog. The web server returns it as a JSON event the frontend handles. Same event, different presentation.
Test the core without any interface. If your tests need a terminal or a browser, your abstraction is leaking.

9. Model Abstraction and Aliasing

The pattern: The harness maintains a model router that resolves human-friendly aliases (e.g., claude-opus-4-6) to actual API model IDs, supports multiple providers, and insulates the prompt logic from model identity changes.

Why theory misses it: Cost Management mentions model routing for cost optimization (cheap model for simple tasks). It doesn’t address the mechanical pattern of how the harness abstracts model identity.

What the abstraction must handle:

Alias resolution: opus resolves to claude-opus-4-6-20260401. The underlying ID changes with each release; the alias doesn’t.
Provider routing: The same harness supports Anthropic’s API and OpenAI-compatible endpoints (xAI, Together, local models). Each provider has different auth, different endpoints, and different response formats.
Capability detection: Not all models support tool use, streaming, or vision. The harness must know what the current model can do before constructing the prompt.
Fallback chains: If the primary model is unavailable (rate limit, outage), fall back to an alternative without losing the conversation.

Design guidance:

Define a ModelSpec struct: { id, provider, max_tokens, supports_tools, supports_vision, supports_streaming, cost_per_input_token, cost_per_output_token }.
Resolve aliases at session start and pin the resolved ID for the session. Don’t re-resolve mid-conversation.
Abstract the API client behind a Provider trait/interface. Adding a new provider should require implementing one interface, not modifying the conversation loop.

10. The Plugin Trust Boundary

The pattern: Plugins extend the agent with new tools, commands, and hooks. But the trust model for plugins is fundamentally different from MCP servers, and this difference is rarely made explicit.

Why theory misses it: Tool Protocols documents MCP’s transport-level isolation (stdio/WebSocket). Sandboxing documents OS-level enforcement. Neither addresses the middle ground: locally installed plugins that run in-process with the harness.

The trust spectrum:

More isolated ◄──────────────────────────────────► Less isolated

Cloud sandbox    MCP server     Plugin (sandboxed)    Plugin (in-process)
  (Codex)      (stdio/WebSocket)   (WASM, Deno)       (dynamic linking)

MCP servers are process-isolated. They communicate over a transport protocol. A malicious MCP server can return bad data but can’t read the harness’s memory or steal credentials.
In-process plugins share the harness’s address space. A malicious plugin can do anything the harness can do: read credentials, modify tool results, exfiltrate conversation history.

Design guidance:

Make the trust model explicit. If plugins run in-process, document that installing a plugin is equivalent to granting it full access to everything the agent can reach.
Consider WASM or subprocess isolation for untrusted plugins. The performance cost is real but the security boundary is worth it.
At minimum, require plugins to declare their capabilities: which tools they register, which hooks they intercept, which config keys they read. Audit this declaration at install time.
Sign trusted plugins. The plugin registry should distinguish between verified and unverified extensions.

Summary: Theory vs. Implementation

Existing pattern	What theory says	What implementation adds
Instruction Files	Static markdown at fixed paths	Runtime prompt compilation from a dependency graph
Tool Protocols	MCP/A2A discovery and invocation	Tool failure taxonomy and structured error propagation
Context Management	Budget bands and compaction triggers	Session persistence, resumption, and stale-context handling
Lifecycle Hooks	Pre/post-tool automation	Streaming as a cross-cutting architectural constraint
Sandboxing	OS-level isolation and permission models	Plugin trust boundaries and in-process vs. transport isolation
Settings Architecture	Four-tier config scope	Project context discovery via directory tree walking
Cost Management	Model routing for cost optimization	Model abstraction, aliasing, and provider-agnostic architecture
Not covered	—	LSP as a semantic tool layer
Not covered	—	Editor/IDE compatibility and interface abstraction
Not covered	—	Credential lifecycle management (OAuth, token refresh, storage)

These patterns are not speculative. They are extracted from running code that handles real conversations, real tool calls, and real failures. The gap between the patterns we document and the patterns we build is the gap between architecture and engineering — and closing it is what turns a framework diagram into a working system.