Implementation Delta
The patterns in this guide were derived from theory: studying what platforms ship, extracting the common architecture, and naming the recurring structures. But building an agent harness reveals patterns that theory misses — patterns that only surface when you write the code, handle the edge cases, and watch things break at runtime.
This page documents ten patterns discovered by analyzing real open-source agent harness implementations (notably claw-code, a 30K+ LOC Rust reimplementation of a production agent harness). Each pattern is load-bearing in practice but absent from the existing taxonomy.
1. Prompt Compilation
The pattern: The system prompt is not a file. It is a runtime artifact assembled from a dependency graph of inputs — project context files, tool descriptions, permission state, plugin manifests, session history, hook declarations, and MCP server capabilities.
Why theory misses it: The Instruction Files pattern describes static markdown loaded at session start. In practice, the instruction file is one input to a build process that produces the final prompt.
What it looks like in code:
prompt = []
prompt += load_instruction_files(walk_up_tree(cwd))
prompt += render_tool_descriptions(active_tools)
prompt += render_permission_context(current_mode)
prompt += render_mcp_capabilities(connected_servers)
prompt += render_plugin_manifests(loaded_plugins)
prompt += render_session_summary(compacted_history)
prompt += render_hook_declarations(active_hooks)
Why it matters: Getting prompt compilation wrong — wrong ordering, missing context, exceeding the token budget — silently degrades every downstream behavior. The model doesn’t error; it just gets worse. This makes prompt compilation the most consequential code in the harness, yet it has no dedicated pattern.
Design guidance:
- Treat prompt assembly as a build pipeline with explicit phases: discover, resolve, budget, emit.
- Assign token budgets per phase. If tool descriptions consume 40% of the instruction budget, you have a tool sprawl problem, not a context problem.
- Cache the stable portions (tool schemas rarely change mid-session) and recompile only the dynamic portions (conversation summary, permission state).
- Log the compiled prompt hash per turn so you can correlate behavior changes to prompt changes during debugging.
2. Streaming as an Architectural Constraint
The pattern: Streaming (SSE/WebSocket) is not a UX feature layered on top. It is a structural constraint that changes error handling, cancellation, rendering, tool execution, and backpressure throughout the entire stack.
Why theory misses it: The existing patterns assume request-response semantics. Real harnesses stream tokens as they arrive, which means every component — from the API client to the terminal renderer — must handle partial, in-flight state.
What it forces you to solve:
| Problem | Non-streaming | Streaming |
|---|---|---|
| Error handling | Check response status | Stream can break mid-token, mid-tool-call, or mid-markdown block |
| Cancellation | Don’t send the request | User hits Ctrl+C during a file write — must abort cleanly, discard partial output, and leave the conversation in a valid state |
| Tool calls | Parse complete JSON | Tool call JSON arrives incrementally — must buffer, detect boundaries, and dispatch |
| Rendering | Render complete markdown | Render partial markdown that may have unclosed blocks, incomplete tables, or half-written code fences |
| Backpressure | N/A | Model produces tokens faster than the terminal renders — must buffer without unbounded memory growth |
Design guidance:
- Define a
StreamEventenum that every layer speaks:TokenDelta,ToolCallStart,ToolCallDelta,ToolCallEnd,Error,Done. - Make cancellation a first-class event, not an afterthought. Every async operation must respond to a cancellation signal within one tick.
- Test with slow terminals. If your rendering can’t keep up with the model, you’ll discover backpressure bugs in production.
3. Session Persistence and Resumption
The pattern: Users stop working and come back. The agent must serialize its full conversation state to disk and rehydrate it later, handling the reality that the world changed in between.
Why theory misses it: Context Management addresses within-session concerns (compaction, budgeting). It says nothing about the across-session lifecycle: save, quit, resume, and the stale-context problem that follows.
What goes wrong on resume:
- The codebase changed (someone else pushed commits). The agent’s compacted history references files that no longer exist or have different content.
- Token archaeology: the resumed session carries summarized history from compaction. The agent “remembers” decisions it can’t fully reconstruct.
- OAuth tokens expired during the gap. The first API call after resume fails with a 401 that the agent doesn’t expect.
Design guidance:
- Serialize conversation state as a checkpoint: messages, tool results, compaction summaries, and a manifest of referenced files with their hashes.
- On resume, diff the file manifest against current state. Surface changes to the user: “3 files referenced in this session have changed since you last worked.”
- Refresh credentials before replaying the first message, not after the first failure.
- Consider a “soft resume” that loads the checkpoint summary as context for a new session, rather than replaying the full history. This avoids stale-context problems at the cost of losing fine-grained history.
4. Tool Failure and Recovery
The pattern: Tools fail. bash returns exit code 1. File writes fail because the path doesn’t exist. MCP servers crash mid-call. The harness must treat tool failures as structured data the model reasons about, not exceptions that crash the loop.
Why theory misses it: The Lifecycle Hooks pattern covers pre/post-tool automation. The Tool Protocols pattern covers tool discovery and invocation. Neither addresses what happens when a tool call fails at runtime.
The failure taxonomy:
| Failure type | Example | Correct handling |
|---|---|---|
| Expected error | bash exits with code 1 | Return stderr as tool result. Model adapts. |
| Transient failure | Network timeout on web_fetch | Retry with backoff. Include attempt count in result. |
| Permanent failure | MCP server process died | Mark server as unavailable. Remove its tools from the active registry. Inform the model. |
| Partial result | Streaming tool output cut short | Return what was received with a truncation marker. |
| Permission denial | User rejected the tool call | Return denial as a structured result. Model must not retry the same call. |
Design guidance:
- Every tool result should be an envelope:
{ status: "ok" | "error" | "denied" | "timeout", content: ..., metadata: { duration_ms, exit_code, ... } }. - Never let a tool failure crash the conversation loop. The model is remarkably good at recovering from errors — if you give it the error message.
- Track consecutive failures per tool. If
bashfails 3 times in a row, the model is likely stuck in a retry loop. Surface this to the user rather than burning tokens.
5. Credential Lifecycle Management
The pattern: The agent needs API keys, OAuth tokens, and service credentials. These expire, need rotation, and must be stored securely. The harness must manage the full credential lifecycle: acquire, store, refresh, and revoke.
Why theory misses it: Sandboxing & Permissions discusses the two-phase runtime (secrets available during setup, removed during execution). It doesn’t address how the harness itself authenticates to the services it depends on — the Anthropic API, MCP servers, OAuth providers.
What a real implementation requires:
- OAuth with PKCE: Browser-based login flow, authorization code exchange, token storage, and silent refresh.
- Token storage: Credentials persisted to
~/.agent/oauth/(or platform equivalent) with appropriate filesystem permissions. - Refresh before expiry: Proactive token refresh, not reactive 401 handling. A failed refresh mid-conversation is a terrible user experience.
- Credential isolation: Each MCP server may need its own credentials. The harness must manage a credential store indexed by server identity.
- Logout/revoke: Users must be able to explicitly clear stored credentials.
Design guidance:
- Implement credential refresh as a background task that runs before each API call, not as error handling after a 401.
- Store tokens with creation timestamp and TTL. Refresh at 80% of TTL, not on expiry.
- Separate credential storage from configuration. Credentials are secrets; config is not. They have different security requirements and different lifecycles.
6. Project Context Discovery
The pattern: The agent doesn’t just load a file at a fixed path. It walks up the directory tree from the current working directory, discovers all relevant context files, and merges them by proximity (closer files win).
Why theory misses it: Instruction Files describes a four-layer precedence model (managed > user > project > local). In practice, the “project” layer is itself hierarchical — a monorepo may have a root CLAUDE.md and subdirectory-specific overrides, and the agent must discover and merge all of them.
The discovery algorithm:
context_files = []
dir = cwd
while dir != filesystem_root:
for pattern in ["CLAUDE.md", ".claude/settings.json", ...]:
if exists(dir / pattern):
context_files.prepend(dir / pattern)
dir = parent(dir)
Files closer to cwd override files further up. This handles:
- Monorepos:
/repo/CLAUDE.mdsets global conventions;/repo/services/api/CLAUDE.mdadds API-specific rules. - Nested projects: A git submodule with its own context file.
- Subdirectory invocation: The user runs the agent from
src/components/and still gets the root project context.
Design guidance:
- Walk up, not down. Searching downward is unbounded and slow. Searching upward is O(depth) and hits the root quickly.
- Stop at the filesystem root or the first
.gitboundary (configurable). - Merge by key, not by concatenation. If the root says
"permissionMode": "ask"and the subdirectory says"permissionMode": "dontAsk", the subdirectory wins — don’t append both.
7. LSP as a Semantic Tool Layer
The pattern: Language Server Protocol gives the agent structured code understanding — go-to-definition, find-references, type checking, diagnostics — that is orders of magnitude more reliable than text search.
Why theory misses it: Tool Protocols documents MCP, A2A, and WebMCP. LSP is absent from the protocol stack despite being a mature, widely-deployed standard that every major language supports.
What LSP provides that text tools don’t:
| Capability | Text tools (grep, glob) | LSP |
|---|---|---|
| ”What calls this function?” | Regex search, high false-positive rate | textDocument/references — precise, semantic |
| ”What type does this return?” | Heuristic parsing | textDocument/hover — compiler-accurate |
| ”Is this code valid?” | Run the compiler (slow, noisy) | textDocument/diagnostic — incremental, real-time |
| ”Rename this symbol everywhere” | Find-and-replace (breaks strings, comments) | textDocument/rename — semantic, safe |
Design guidance:
- Expose LSP capabilities as tools in the agent’s tool registry:
lsp_references,lsp_hover,lsp_diagnostics,lsp_rename. - Start the language server lazily on first use, not at session start. Most conversations don’t need it.
- LSP servers are stateful and memory-intensive. Kill them when the session ends or when context compaction removes the files they were tracking.
- Fall back gracefully. If no LSP server is available for the current language, the agent should use text tools without error.
8. Editor and IDE Compatibility
The pattern: The same agent core must work as a CLI REPL, a VS Code extension, a JetBrains plugin, and a web interface. The harness must separate the core (conversation loop, tools, permissions) from the interface (terminal rendering, JSON-RPC, HTTP).
Why theory misses it: The existing patterns describe what the agent does, not how it presents itself. In practice, the interface layer is a major engineering surface with its own constraints.
The compatibility matrix:
| Interface | Transport | Input | Output | Constraints |
|---|---|---|---|---|
| CLI REPL | stdin/stdout | Line editing (rustyline) | Streaming markdown + syntax highlighting | Terminal width, color support, signal handling |
| VS Code | JSON-RPC over stdio | Extension API messages | Webview panels, editor decorations | Extension host lifecycle, webview security |
| JetBrains | HTTP/WebSocket | Plugin API messages | Tool windows, editor annotations | JVM process model, Kotlin/Java interop |
| Web | HTTP/SSE | REST API | JSON events | CORS, authentication, no filesystem access |
Design guidance:
- Extract the core as a library crate/package. The CLI, extension, and web server are thin shells that translate between their interface and the core API.
- Define a
HarnessEventstream that all interfaces consume:MessageStart,TokenDelta,ToolCallRequest,ToolCallResult,PermissionPrompt,SessionEnd. - The CLI renders
PermissionPromptas an interactive terminal prompt. The IDE renders it as a dialog. The web server returns it as a JSON event the frontend handles. Same event, different presentation. - Test the core without any interface. If your tests need a terminal or a browser, your abstraction is leaking.
9. Model Abstraction and Aliasing
The pattern: The harness maintains a model router that resolves human-friendly aliases (e.g., claude-opus-4-6) to actual API model IDs, supports multiple providers, and insulates the prompt logic from model identity changes.
Why theory misses it: Cost Management mentions model routing for cost optimization (cheap model for simple tasks). It doesn’t address the mechanical pattern of how the harness abstracts model identity.
What the abstraction must handle:
- Alias resolution:
opusresolves toclaude-opus-4-6-20260401. The underlying ID changes with each release; the alias doesn’t. - Provider routing: The same harness supports Anthropic’s API and OpenAI-compatible endpoints (xAI, Together, local models). Each provider has different auth, different endpoints, and different response formats.
- Capability detection: Not all models support tool use, streaming, or vision. The harness must know what the current model can do before constructing the prompt.
- Fallback chains: If the primary model is unavailable (rate limit, outage), fall back to an alternative without losing the conversation.
Design guidance:
- Define a
ModelSpecstruct:{ id, provider, max_tokens, supports_tools, supports_vision, supports_streaming, cost_per_input_token, cost_per_output_token }. - Resolve aliases at session start and pin the resolved ID for the session. Don’t re-resolve mid-conversation.
- Abstract the API client behind a
Providertrait/interface. Adding a new provider should require implementing one interface, not modifying the conversation loop.
10. The Plugin Trust Boundary
The pattern: Plugins extend the agent with new tools, commands, and hooks. But the trust model for plugins is fundamentally different from MCP servers, and this difference is rarely made explicit.
Why theory misses it: Tool Protocols documents MCP’s transport-level isolation (stdio/WebSocket). Sandboxing documents OS-level enforcement. Neither addresses the middle ground: locally installed plugins that run in-process with the harness.
The trust spectrum:
More isolated ◄──────────────────────────────────► Less isolated
Cloud sandbox MCP server Plugin (sandboxed) Plugin (in-process)
(Codex) (stdio/WebSocket) (WASM, Deno) (dynamic linking)
- MCP servers are process-isolated. They communicate over a transport protocol. A malicious MCP server can return bad data but can’t read the harness’s memory or steal credentials.
- In-process plugins share the harness’s address space. A malicious plugin can do anything the harness can do: read credentials, modify tool results, exfiltrate conversation history.
Design guidance:
- Make the trust model explicit. If plugins run in-process, document that installing a plugin is equivalent to granting it full access to everything the agent can reach.
- Consider WASM or subprocess isolation for untrusted plugins. The performance cost is real but the security boundary is worth it.
- At minimum, require plugins to declare their capabilities: which tools they register, which hooks they intercept, which config keys they read. Audit this declaration at install time.
- Sign trusted plugins. The plugin registry should distinguish between verified and unverified extensions.
Summary: Theory vs. Implementation
| Existing pattern | What theory says | What implementation adds |
|---|---|---|
| Instruction Files | Static markdown at fixed paths | Runtime prompt compilation from a dependency graph |
| Tool Protocols | MCP/A2A discovery and invocation | Tool failure taxonomy and structured error propagation |
| Context Management | Budget bands and compaction triggers | Session persistence, resumption, and stale-context handling |
| Lifecycle Hooks | Pre/post-tool automation | Streaming as a cross-cutting architectural constraint |
| Sandboxing | OS-level isolation and permission models | Plugin trust boundaries and in-process vs. transport isolation |
| Settings Architecture | Four-tier config scope | Project context discovery via directory tree walking |
| Cost Management | Model routing for cost optimization | Model abstraction, aliasing, and provider-agnostic architecture |
| Not covered | — | LSP as a semantic tool layer |
| Not covered | — | Editor/IDE compatibility and interface abstraction |
| Not covered | — | Credential lifecycle management (OAuth, token refresh, storage) |
These patterns are not speculative. They are extracted from running code that handles real conversations, real tool calls, and real failures. The gap between the patterns we document and the patterns we build is the gap between architecture and engineering — and closing it is what turns a framework diagram into a working system.