Architecture¶
System Overview¶
Agent Harness is a compositional execution-state management system for long-horizon agentic code generation loops. It enables agents to generate, execute, and iteratively refine Python code through a structured feedback loop. The system maintains a directed acyclic graph (DAG) of execution steps, tracks variable dependencies across steps, diagnoses failures through automated feedback interpretation, and generates targeted patches to fix broken steps. This architecture supports selective re-execution of affected downstream steps while preserving the outputs of unaffected upstream computations, enabling efficient refinement cycles in code generation workflows.
Module Dependency Graph¶
graph TB
subgraph "Core State Management"
ESG["esg.ExecutionStateGraph"]
end
subgraph "Analysis & Tracking"
DA["dependency_analyzer<br/>get_reads, get_writes<br/>infer_edges"]
end
subgraph "Execution & Diagnosis"
EXE["executor.Executor<br/>execute, namespace"]
FI["feedback_interpreter<br/>diagnose"]
end
subgraph "Patch Generation"
PG["patch_generator<br/>generate_patch"]
end
DA -->|infers variable deps| ESG
EXE -->|populates execution state| ESG
FI -->|analyzes step failures| ESG
PG -->|generates fixes| ESG
FI -->|consumes StepResult| EXE
PG -->|consumes DiagnosisReport| FI
style ESG fill:#e1f5ff
style DA fill:#f3e5f5
style EXE fill:#e8f5e9
style FI fill:#fff3e0
style PG fill:#fce4ec
Module Descriptions¶
src/agent_harness/esg.py – Execution State Graph¶
The core state management module that models the execution history as a directed acyclic graph of computation steps.
Key Abstractions:
- StepStatus (Enum): Tracks step lifecycle (PENDING, EXECUTED, STALE, FAILED)
- ExecutionStateGraph: Main orchestrator class maintaining:
- Step metadata (source code, status, outputs)
- DAG topology with variable dependencies
- Execution namespace and memoized results
Key Methods:
- add_step(step_id, source) – Register a new step with its source code
- add_edge(src_step_id, dst_step_id) – Record explicit dependency between steps
- record_output(step_id, var_name, value) – Store computed variable outputs
- get_ancestors(step_id, variable_names) – Find all upstream steps that compute required variables (dependency backtracking)
- get_subgraph_from(step_id) – List all downstream steps affected by a given step
- mark_stale(step_id) – Propagate staleness to all dependent steps
- replay_from(step_id) – Return list of steps to re-execute after a fix
- graph() – Return underlying networkx.DiGraph for inspection
Role in System: Serves as the single source of truth for execution state, enabling dependency-driven selective re-execution and failure diagnosis.
src/agent_harness/dependency_analyzer.py – Dependency Analysis¶
Static analysis module for extracting variable read/write dependencies from Python source code.
Key Functions:
- get_reads(source: str) -> set[str] – Extract all variable names read by the code (using AST analysis)
- get_writes(source: str) -> set[str] – Extract all variable names written/assigned by the code
- infer_edges(steps: list[tuple[str, str]]) -> list[tuple[str, str]] – Given step IDs and source code, infer implicit variable-based edges and return edge list
Role in System: Powers automatic dependency inference when explicit edges are not provided. Enables the system to determine which steps must re-execute based on variable flow without user annotation.
src/agent_harness/executor.py – Execution Engine¶
Executes individual steps in the current execution namespace and captures results.
Key Abstractions:
- StepResult (Pydantic BaseModel): Immutable result container with:
- step_id: str – Identifier of executed step
- success: bool – Whether execution succeeded
- output: dict[str, Any] – Computed variables
- error: str | None – Exception message if failed
- traceback: str | None – Full stack trace if failed
Executor: Stateful executor class
Key Methods:
- namespace() -> dict[str, Any] – Return accumulated variable namespace from all prior successful executions
- execute(step_id, source) -> StepResult – Execute source code in current namespace, capture output, return result
Role in System: Isolates execution logic and provides structured result representation for downstream diagnosis and patching.
src/agent_harness/feedback_interpreter.py – Failure Diagnosis¶
Analyzes execution failures and produces structured diagnostic reports for patch generation.
Key Abstractions:
- DiagnosisReport (Pydantic BaseModel): Structured failure analysis with:
- step_id: str – Failed step ID
- error_type: str – Category of error (e.g., NameError, TypeError, RuntimeError)
- error_message: str – Exception message
- root_cause: str – Natural language analysis of likely root cause
- affected_variables: list[str] – Variables the step tried to compute
- missing_dependencies: list[str] – Variables referenced but not in namespace
- suggestions: list[str] – Actionable fix suggestions
Key Functions:
- diagnose(step_result: StepResult, esg: ExecutionStateGraph) -> DiagnosisReport – Analyze a failed StepResult in context of execution history and produce diagnostic report
Role in System: Bridges execution failures and patch generation by providing structured, actionable feedback that patch generators consume.
src/agent_harness/patch_generator.py – Patch Generation¶
Generates corrected source code based on diagnosis reports.
Key Abstractions:
- PatchResult (Pydantic BaseModel): Patch proposal with:
- step_id: str – Step being patched
- original_source: str – Original (broken) code
- patched_source: str – Corrected code
- explanation: str – Why the patch was generated
- confidence: float – Confidence score [0, 1]
Key Functions:
- generate_patch(diagnosis_report: DiagnosisReport, esg: ExecutionStateGraph) -> PatchResult – Given a diagnostic report, generate a corrected version of the step's source code
Role in System: Consumes structured diagnostics and synthesizes improved code using code generation strategies (prompting, rule-based rewrites, etc.).
src/agent_harness/__init__.py – Package Exports¶
Provides top-level API for importing all public classes and functions.
Exports:
- ExecutionStateGraph, StepStatus
- Executor, StepResult
- DiagnosisReport, diagnose
- PatchResult, generate_patch
- get_reads, get_writes, infer_edges
tests/ – Test Suite¶
tests/__init__.py: Test package markertests/conftest.py: Pytest fixtures and shared test utilities (e.g., sample code snippets, mockExecutorinstances, pre-builtExecutionStateGraphfixtures)
Data Flow¶
``` ┌─────────────────────────────────────────────────────────────────────┐ │ EXECUTION LOOP │ └─────────────────────────────────────────────────────────────────────┘
- STEP REGISTRATION Agent generates step source code ↓ esg.