Skip to content

Architecture

System Overview

Agent Harness is a compositional execution-state management system for long-horizon agentic code generation loops. It enables agents to generate, execute, and iteratively refine Python code through a structured feedback loop. The system maintains a directed acyclic graph (DAG) of execution steps, tracks variable dependencies across steps, diagnoses failures through automated feedback interpretation, and generates targeted patches to fix broken steps. This architecture supports selective re-execution of affected downstream steps while preserving the outputs of unaffected upstream computations, enabling efficient refinement cycles in code generation workflows.

Module Dependency Graph

graph TB
    subgraph "Core State Management"
        ESG["esg.ExecutionStateGraph"]
    end

    subgraph "Analysis & Tracking"
        DA["dependency_analyzer<br/>get_reads, get_writes<br/>infer_edges"]
    end

    subgraph "Execution & Diagnosis"
        EXE["executor.Executor<br/>execute, namespace"]
        FI["feedback_interpreter<br/>diagnose"]
    end

    subgraph "Patch Generation"
        PG["patch_generator<br/>generate_patch"]
    end

    DA -->|infers variable deps| ESG
    EXE -->|populates execution state| ESG
    FI -->|analyzes step failures| ESG
    PG -->|generates fixes| ESG
    FI -->|consumes StepResult| EXE
    PG -->|consumes DiagnosisReport| FI

    style ESG fill:#e1f5ff
    style DA fill:#f3e5f5
    style EXE fill:#e8f5e9
    style FI fill:#fff3e0
    style PG fill:#fce4ec

Module Descriptions

src/agent_harness/esg.py – Execution State Graph

The core state management module that models the execution history as a directed acyclic graph of computation steps.

Key Abstractions: - StepStatus (Enum): Tracks step lifecycle (PENDING, EXECUTED, STALE, FAILED) - ExecutionStateGraph: Main orchestrator class maintaining: - Step metadata (source code, status, outputs) - DAG topology with variable dependencies - Execution namespace and memoized results

Key Methods: - add_step(step_id, source) – Register a new step with its source code - add_edge(src_step_id, dst_step_id) – Record explicit dependency between steps - record_output(step_id, var_name, value) – Store computed variable outputs - get_ancestors(step_id, variable_names) – Find all upstream steps that compute required variables (dependency backtracking) - get_subgraph_from(step_id) – List all downstream steps affected by a given step - mark_stale(step_id) – Propagate staleness to all dependent steps - replay_from(step_id) – Return list of steps to re-execute after a fix - graph() – Return underlying networkx.DiGraph for inspection

Role in System: Serves as the single source of truth for execution state, enabling dependency-driven selective re-execution and failure diagnosis.


src/agent_harness/dependency_analyzer.py – Dependency Analysis

Static analysis module for extracting variable read/write dependencies from Python source code.

Key Functions: - get_reads(source: str) -> set[str] – Extract all variable names read by the code (using AST analysis) - get_writes(source: str) -> set[str] – Extract all variable names written/assigned by the code - infer_edges(steps: list[tuple[str, str]]) -> list[tuple[str, str]] – Given step IDs and source code, infer implicit variable-based edges and return edge list

Role in System: Powers automatic dependency inference when explicit edges are not provided. Enables the system to determine which steps must re-execute based on variable flow without user annotation.


src/agent_harness/executor.py – Execution Engine

Executes individual steps in the current execution namespace and captures results.

Key Abstractions: - StepResult (Pydantic BaseModel): Immutable result container with: - step_id: str – Identifier of executed step - success: bool – Whether execution succeeded - output: dict[str, Any] – Computed variables - error: str | None – Exception message if failed - traceback: str | None – Full stack trace if failed

  • Executor: Stateful executor class

Key Methods: - namespace() -> dict[str, Any] – Return accumulated variable namespace from all prior successful executions - execute(step_id, source) -> StepResult – Execute source code in current namespace, capture output, return result

Role in System: Isolates execution logic and provides structured result representation for downstream diagnosis and patching.


src/agent_harness/feedback_interpreter.py – Failure Diagnosis

Analyzes execution failures and produces structured diagnostic reports for patch generation.

Key Abstractions: - DiagnosisReport (Pydantic BaseModel): Structured failure analysis with: - step_id: str – Failed step ID - error_type: str – Category of error (e.g., NameError, TypeError, RuntimeError) - error_message: str – Exception message - root_cause: str – Natural language analysis of likely root cause - affected_variables: list[str] – Variables the step tried to compute - missing_dependencies: list[str] – Variables referenced but not in namespace - suggestions: list[str] – Actionable fix suggestions

Key Functions: - diagnose(step_result: StepResult, esg: ExecutionStateGraph) -> DiagnosisReport – Analyze a failed StepResult in context of execution history and produce diagnostic report

Role in System: Bridges execution failures and patch generation by providing structured, actionable feedback that patch generators consume.


src/agent_harness/patch_generator.py – Patch Generation

Generates corrected source code based on diagnosis reports.

Key Abstractions: - PatchResult (Pydantic BaseModel): Patch proposal with: - step_id: str – Step being patched - original_source: str – Original (broken) code - patched_source: str – Corrected code - explanation: str – Why the patch was generated - confidence: float – Confidence score [0, 1]

Key Functions: - generate_patch(diagnosis_report: DiagnosisReport, esg: ExecutionStateGraph) -> PatchResult – Given a diagnostic report, generate a corrected version of the step's source code

Role in System: Consumes structured diagnostics and synthesizes improved code using code generation strategies (prompting, rule-based rewrites, etc.).


src/agent_harness/__init__.py – Package Exports

Provides top-level API for importing all public classes and functions.

Exports: - ExecutionStateGraph, StepStatus - Executor, StepResult - DiagnosisReport, diagnose - PatchResult, generate_patch - get_reads, get_writes, infer_edges


tests/ – Test Suite

  • tests/__init__.py: Test package marker
  • tests/conftest.py: Pytest fixtures and shared test utilities (e.g., sample code snippets, mock Executor instances, pre-built ExecutionStateGraph fixtures)

Data Flow

``` ┌─────────────────────────────────────────────────────────────────────┐ │ EXECUTION LOOP │ └─────────────────────────────────────────────────────────────────────┘

  1. STEP REGISTRATION Agent generates step source code ↓ esg.