Blueprint Documentation

Specification-Driven Development for AI Coding Agents

Overview

Blueprint is a Claude Code plugin for specification-driven development. Instead of prompting an AI agent and hoping for the best, Blueprint introduces a specification layer between your intent and the code. You describe what you want. The system decomposes it into domain blueprints with numbered requirements and testable acceptance criteria. Then it builds from those blueprints — not from memory, not from vibes — in an automated loop that validates every step.

Blueprint is for developers who use AI coding agents and want reliable, traceable results. The blueprints are the source of truth. Agents read them, build from them, and validate against them. When something breaks, the system traces the failure back to the blueprint — not the code.

The DABI Lifecycle

Blueprint follows four phases — each driven by a slash command inside Claude Code:

Phase Command What It Does What It Produces
Draft /bp:draft Decompose requirements into domain blueprints Blueprints with R-numbered requirements
Architect /bp:architect Break into tasks, map dependencies, organize into tiers Tiered build site + dependency graph
Build /bp:build Auto-parallel build with validation at every step Working software, committed tier by tier
Inspect /bp:inspect Gap analysis + peer review against blueprints Findings report traced to specs

An optional Research phase grounds the design in real evidence before blueprints are written. For a deeper explanation, see the DABI Lifecycle section under Methodology.

Quick Start

Two paths depending on whether you are starting fresh or adding Blueprint to an existing codebase.

Greenfield

Starting a new project from scratch. Blueprint decomposes your idea into blueprints, plans the build order, then implements everything automatically.

> /bp:draft
# Blueprint asks what you're building
What are you building?

> A REST API for task management. Users, projects, tasks with priorities
  and due dates, assignments. PostgreSQL.

# Blueprint decomposes into domain blueprints with numbered requirements
Created 4 blueprints (22 requirements, 69 acceptance criteria)
Next: /bp:architect

> /bp:architect
# Reads blueprints, breaks into tasks, maps dependencies into tiers
Generated build site: 34 tasks, 5 tiers
Next: /bp:build

> /bp:build
# The Ralph Loop — implements, validates, commits, repeats
Loop activated — 34 tasks, 20 max iterations.
...
All tasks done. Build passes. Tests pass.
BLUEPRINT COMPLETE — 34 tasks in 18 iterations.

Every line of code traces to a requirement. Every requirement has acceptance criteria. See /bp:draft, /bp:architect, and /bp:build for full command details.

Brownfield

Adding Blueprint to an existing codebase. Use --from-code to reverse-engineer blueprints from your code, then scope the build to the gaps.

> /bp:draft --from-code
# Blueprint explores your codebase and reverse-engineers specs
Exploring codebase... Next.js 14, Prisma, NextAuth.
Created 6 blueprints — 4 requirements are gaps (not yet implemented).

> /bp:architect --filter collaboration
# Only plan the subset you want to build now
Generated build site: 8 tasks, 3 tiers

> /bp:build
# Builds only the filtered tasks
Loop activated — 8 tasks.
...
BLUEPRINT COMPLETE — 8 tasks in 8 iterations.

The --from-code flag tells Blueprint to read your existing code and produce specs that match it, while highlighting gaps. The --filter flag on /bp:architect scopes the build to a specific domain.

Commands

All Blueprint commands are Claude Code slash commands. They follow the DABI lifecycle phases, plus utility commands for inspection and maintenance.

/bp:draft Draft

Decompose requirements into domain blueprints with numbered requirements and testable acceptance criteria. Each blueprint is stack-independent and human-readable.

> /bp:draft
What are you building?
> A REST API for task management with users, projects, and tasks.
Created 4 blueprints (22 requirements, 69 acceptance criteria)

When the project would benefit from it, the draft phase offers to run deep research before design Q&A. After the internal reviewer approves, blueprints are sent to Codex for a design challenge.

FlagDescription
--from-codeReverse-engineer blueprints from an existing codebase and identify gaps

Related: /bp:architect, /bp:research

/bp:architect Architect

Read all blueprints, break requirements into tasks, map dependencies, and organize everything into a tiered build site. Tier 0 has no dependencies, Tier 1 depends only on Tier 0, and so on.

> /bp:architect
Generated build site: 34 tasks, 5 tiers
Next: /bp:build
FlagDescription
--filter <domain>Scope the build site to a specific domain or subset of blueprints

Related: /bp:draft, /bp:build

/bp:build Build

The Ralph Loop. Automatically implements tasks from the build site, validates each against acceptance criteria, commits on pass, diagnoses and fixes on fail. Parallelizes independent tasks across subagents and progresses through tiers autonomously.

> /bp:build
Loop activated — 34 tasks, 20 max iterations.

═══ Wave 1 ═══
3 task(s) ready:
  T-001: Database schema (tier 0, deps: none)
  T-002: Auth middleware (tier 0, deps: none)
  T-003: Config loader (tier 0, deps: none)

Dispatching 2 grouped subagents...
All 3 tasks complete. Merging...

═══ Wave 2 ═══
2 task(s) ready...
...
═══ BUILD COMPLETE ═══
Waves: 2 | Tasks: 5/5

At every tier boundary, Codex adversarial review gates advancement — P0/P1 findings must be fixed before the next tier starts. Circuit breakers prevent infinite loops: 3 test failures marks a task BLOCKED.

Related: /bp:architect, /bp:progress, /bp:inspect

/bp:inspect Inspect

Gap analysis compares what was built against what was specified. Peer review checks for bugs, security issues, and missed requirements. Everything is traced back to blueprint requirements.

> /bp:inspect
Gap analysis: 22/22 requirements covered
Peer review: 1 finding (P2 — input validation on PATCH endpoint)
Traced to: blueprint-api.md R14

Related: /bp:gap-analysis, /bp:build

/bp:research Research

Dispatch 2-8 parallel subagents to explore the codebase and search the web for current best practices, library landscape, reference implementations, and common pitfalls. A synthesizer agent cross-validates findings and produces a research brief.

> /bp:research "build a Verse compiler targeting WASM"
Dispatching 4 research agents...
Synthesizing findings...
Research brief saved to context/refs/research-brief-verse-wasm.md

Research is also offered inline during /bp:draft when the project involves unfamiliar technology or architectural decisions with multiple viable approaches.

Related: /bp:draft

/bp:progress Utility

Check build site progress — tasks done, in progress, blocked, and remaining.

> /bp:progress
Build site: build-site-api.md
Done: 28/34 | In progress: 2 | Blocked: 1 | Remaining: 3

Related: /bp:build, /bp:gap-analysis

/bp:gap-analysis Utility

Compare what was built against what was intended. Lists requirements that are fully covered, partially covered, or missing implementation.

> /bp:gap-analysis
Scanning 4 blueprints against codebase...
Covered: 20/22 | Partial: 1 | Missing: 1
  R14 (input validation): partial — PATCH endpoint missing
  R18 (rate limiting): missing

Related: /bp:inspect, /bp:revise

/bp:revise Utility

Trace manual fixes back into blueprints. When you fix a bug or change code by hand, /bp:revise updates the blueprints and context files so the specification stays in sync with reality.

> /bp:revise
Detected 3 manual changes since last build.
Updated blueprint-api.md: R14 acceptance criteria refined.
Updated impl-api.md: manual fix logged.

Related: /bp:draft, /bp:gap-analysis

/bp:codex-review Utility

Run a standalone Codex adversarial review on the current diff. Useful outside of the build loop when you want a second-model review on demand.

> /bp:codex-review
Reviewing diff against worktree base...
2 findings: 1 P1 (nil pointer in error path), 1 P3 (unused import)

Requires Codex to be installed. See Codex Integration for details.

Related: Tier Gate, Design Challenge

/bp:help Utility

Show the Blueprint usage guide — lists all commands, their phases, and a brief description of the DABI workflow.

> /bp:help
Blueprint v2.1.0 — Specification-Driven Development

Commands:
  /bp:draft       Draft blueprints from requirements
  /bp:architect   Generate tiered build site
  /bp:build       Auto-parallel build loop
  /bp:inspect     Gap analysis + peer review
  ...

Methodology

Blueprint is built on a simple observation: LLMs are non-deterministic, but software engineering doesn't have to be. By applying structured methodology, we extract reliable outcomes from a stochastic process.

The DABI Lifecycle

Every Blueprint project follows four phases. Each phase has a clear input, a clear output, and a validation gate before the next phase begins.

Draft — define the what. You describe what you're building in natural language. Blueprint decomposes it into domain blueprints — structured documents with numbered requirements (R1, R2, ...) and testable acceptance criteria. Each blueprint is stack-independent and human-readable. An optional research phase can ground the design in real evidence before blueprints are written. After the internal reviewer approves, blueprints are sent to Codex for a design challenge that catches decomposition flaws, missing requirements, and ambiguous criteria.

Architect — plan the order. Reads all blueprints, breaks requirements into tasks, maps dependencies, and organizes everything into a tiered build site — a dependency graph where Tier 0 has no dependencies, Tier 1 depends only on Tier 0, and so on. This is what the build loop consumes.

Build — run the loop. The Ralph Loop. Each iteration reads the build site, finds the next unblocked task, loads the relevant blueprint and acceptance criteria, implements the task, then validates (build + tests + acceptance criteria). Pass means commit, mark done, move to the next task. Fail means diagnose, fix, revalidate. At every tier boundary, Codex adversarial review gates advancement. The loop runs until all tasks are done or the iteration limit is reached.

Inspect — verify the result. Gap analysis compares what was built against what was specified. Peer review checks for bugs, security issues, and missed requirements. Everything is traced back to blueprint requirements so you know exactly where any issue originated.

Blueprints as Source of Truth

Most AI coding tools treat the agent as a black box — you prompt, it generates, you hope. Blueprint inverts this. The specification is the product. The code is a derivative. When the spec is clear, the code follows. When the code is wrong, the spec tells you why.

This matters because AI agents are getting better every month, but the fundamental problem remains: without a specification, there is nothing to validate against. Blueprint gives every agent — current and future — a contract to build from and a standard to meet.

Blueprints persist across build cycles. They are not ephemeral prompts — they are versioned, structured documents that drive every downstream decision. When a bug is found, /bp:revise traces it back to the blueprint and updates the specification so the iteration loop can reproduce the fix automatically.

With Codex adversarial review, Blueprint goes further: a second model with different training and different blind spots reviews both the specification and the implementation. Two models disagreeing is a signal. Two models agreeing is confidence.

Scientific Method Applied

Blueprint applies the scientific method — hypothesize, test, observe, refine — to extract reliable outcomes from a stochastic process. Each concept in the methodology maps directly to a step in the scientific method:

Scientific Method Blueprint Concept Role
Hypothesis Blueprints What you expect the software to do — numbered requirements with testable criteria
Controlled conditions Validation gates Build, tests, and acceptance criteria that must pass before advancing
Repeated trials Convergence loops The Ralph Loop — iterate until stable, with circuit breakers to prevent infinite loops
Lab notebook Implementation tracking What was tried, what worked, what failed — living records in context/impl/
Update the hypothesis Revision Trace bugs back to blueprints via /bp:revise and fix at the source

The key insight is that non-determinism is not a problem if you have a validation loop. A single LLM pass produces a rough draft. But repeated passes with clear acceptance criteria converge toward correctness — the same way repeated experiments converge toward truth.

The plugin ships with 8 specialized agents, a multi-agent research system, and 13 deep-dive skills covering the full methodology. When Codex is installed, the system operates as a dual-model architecture — Claude builds and Codex reviews — catching classes of errors that single-model self-review cannot detect.

Codex Integration

Blueprint uses Codex (OpenAI's coding agent) as an adversarial reviewer — a second model with a fundamentally different perspective that catches blind spots Claude cannot see in its own output. This dual-model approach operates at three levels: design, build, and command safety.

Design Challenge

What it does: An adversarial review of your blueprints focused exclusively on architecture-level concerns. A second model challenges the design before any code is written.

When it triggers: Automatically after Claude drafts blueprints and the internal reviewer approves them during /bp:draft.

How it works: The entire blueprint set is sent to Codex for a design challenge. Codex checks for domain decomposition quality, missing requirements, ambiguous acceptance criteria, implicit assumptions, and cross-domain coherence.

# Flow: draft → internal review → Codex challenge → user review
Claude drafts blueprints
  → Blueprint reviewer approves
  → Codex challenges the design
  → User reviews blueprints + findings

Codex returns structured findings categorized as critical (must fix before building) or advisory (worth considering). Critical findings trigger an auto-fix loop — Claude addresses them, Codex re-challenges, up to 2 cycles. Advisory findings are presented alongside blueprints at the user review gate.

The design challenge is purpose-built to prohibit implementation feedback. No framework suggestions, no file path opinions — only design-level concerns that would cause real problems during the build phase.

Configuration: Controlled by the codex_review setting. Set to off to skip design challenges.

Tier Gate

What it does: A severity-based code review gate that blocks tier advancement when critical defects are found. Codex reviews the diff of every completed tier during /bp:build.

When it triggers: Automatically at every tier boundary during the build loop — after all tasks in a tier complete and before the next tier starts.

How it works: Codex reviews the combined diff of all tasks completed in the tier. Each finding is classified by severity:

SeverityBehavior
P0 (critical)Blocks tier advancement. Fix task generated automatically.
P1 (high)Blocks tier advancement. Fix task generated automatically.
P2 (medium)Deferred. Logged but does not block.
P3 (low)Deferred. Logged but does not block.
═══ Tier 0 Complete ═══
Codex reviews diff (T-001, T-002, T-003) ...
Review: 2 findings (1 P0, 1 P3)
Gate: BLOCKED → fix cycle 1/2
Fixing P0: nil pointer in auth middleware ...
Re-review ...
Gate: PROCEED
═══ Tier 1 starting ═══

The review-fix cycle runs up to 2 iterations per tier. After that, the build advances with a warning — the system never deadlocks.

Gate modes are configurable via the tier_gate_mode setting:

ModeBehavior
severity (default)P0/P1 findings block advancement; P2/P3 are logged only
strictAll findings block advancement
permissiveNothing blocks — all findings are logged only
offTier gate disabled entirely

Speculative Review

What it does: Runs the Codex tier review in the background while Claude builds the current tier, eliminating gate latency.

When it triggers: Automatically when a tier completes and speculative_review is on (the default).

How it works: When a tier finishes, Blueprint starts the Codex review in a background process. Meanwhile, Claude begins building the next tier immediately. When that next tier finishes and the gate checks for the previous tier's review, the results are already available — cutting tier gate latency to near-zero.

# Speculative review timeline
Tier 0 complete ──────────────────────► Tier 1 complete
     │                                       │
     └── Codex reviews Tier 0 (background) ──►│

                         Results ready ◄──────┘
                         before gate runs

If the background review is not done when the gate runs, the system waits up to the configured timeout (speculative_review_timeout, default 300 seconds) and then falls back to a synchronous review.

Configuration: Toggle with speculative_review (on/off). Adjust wait time with speculative_review_timeout.

Command Safety Gate

What it does: A PreToolUse hook that intercepts every Bash command before execution and classifies its safety, preventing destructive operations.

When it triggers: On every Bash command execution during a Blueprint session (configurable via command_gate).

How it works: Commands are checked through a three-stage pipeline:

# Command safety pipeline
Agent runs bash command


Fast-path check ──► allowlist (50+ safe commands) → approve
     │           └► blocklist (rm -rf, force push, ...) → block

     ▼ (ambiguous)
Codex classifies ──► safe → approve
     │            └► warn → approve + log
     │            └► block → prevent execution

     ▼ (cached)
Verdict cache ──► normalized pattern match → reuse verdict

The gate integrates with Claude Code's permission system — commands already allowed or blocked in settings bypass the gate entirely. Verdicts are cached by normalized command pattern within the session to avoid redundant API calls.

Configuration: Set command_gate to all (default), interactive, or off. Adjust Codex classification timeout with command_gate_timeout (default 3000ms).

Graceful Degradation

What it does: Ensures Blueprint works fully without Codex installed. All Codex features are additive — they enhance the system but are never required.

When it triggers: Automatically when Codex is not detected on the system.

How it works: When Codex is not installed, the following behavior changes apply:

FeatureBehavior Without Codex
Design ChallengeSkipped — the internal blueprint reviewer still runs
Tier GateSkipped — the build loop proceeds without review pauses
Command GateFalls back to static allowlist/blocklist only (no Codex classification)
Install NudgeA one-time tip appears: Tip: Install Codex for adversarial code review

Blueprint works the same as before Codex integration was added. Codex makes it harder to ship bad blueprints and bad code, but the core DABI lifecycle is fully functional without it.

Skills Reference

Blueprint ships with 13 deep-dive skills covering the full methodology. Each skill is a focused knowledge module that teaches a specific aspect of specification-driven development.

Blueprint Writing

How to write blueprints that AI agents can consume effectively. Covers implementation-agnostic blueprint design, testable acceptance criteria, and R-numbered requirement structure.

Use when: Starting a new project or adding a major feature. Any time you need to define what to build before building it.

Convergence Monitoring

Detecting when agent iterations are converging toward a stable solution or hitting a ceiling. Covers convergence signals, ceiling detection, and non-convergence remediation.

Use when: The build loop is running many iterations without completing, or you suspect the agent is stuck in a fix-break cycle.

Peer Review

Six modes for using a second AI agent or model to challenge the primary builder agent's work. Covers Diff Critique, Design Challenge, Threshold Review, and more.

Use when: You want cross-model review of code or specifications, or need to set up adversarial review patterns beyond the built-in Codex integration.

Validation-First Design

Every requirement must be automatically verifiable. Covers the 6-gate validation pipeline, phase gates, and how to write acceptance criteria that agents can test.

Use when: Writing acceptance criteria for blueprints, designing test strategies, or ensuring that every requirement has a clear pass/fail condition.

Context Architecture

Progressive disclosure architecture for organizing project context as a DAG (directed acyclic graph). Agents enter at the root and traverse only the subgraph they need.

Use when: Setting up a project's context/ directory, organizing blueprints and implementation tracking, or debugging context loading issues.

Revision

Tracing bugs and manual fixes back to blueprints and prompts, then fixing at the source so the iteration loop can reproduce the fix automatically.

Use when: You manually fixed a bug and want to update the blueprint so the same fix would be produced by the build loop in a clean run.

Brownfield Adoption

Step-by-step process for adopting Blueprint on an existing codebase. Covers the 6-step brownfield process, bootstrap prompt design, and spec validation against existing code.

Use when: Adding Blueprint to a project that already has code. Use /bp:draft --from-code to reverse-engineer specifications from existing implementations.

Speculative Pipeline

A pipeline execution strategy where downstream stages start before upstream stages finish, using staggered timing with configurable delays. The leader begins work while the reviewer catches up.

Use when: Optimizing build throughput, understanding how speculative review works, or designing custom overlapping phase pipelines.

Prompt Pipeline

How to design the numbered prompt pipeline that drives DABI phases. Covers greenfield 3-prompt patterns, rewrite 6-9 prompt patterns, and shared prompt architecture.

Use when: Customizing the prompts that drive each Blueprint phase, or understanding how the internal prompt chain produces blueprints and build plans.

Implementation Tracking

Living records of what was built, what is pending, what failed, and what dead ends were explored. The lab notebook of the build process.

Use when: Reviewing build progress, understanding why a task was marked BLOCKED, or resuming a build after interruption.

Documentation Inversion

Inverts the traditional documentation flow from code-to-wiki-for-humans (which rots) into code-to-CLAUDE.md-to-skills-for-agents (which stays current).

Use when: Writing documentation that AI agents will consume, or converting existing human-facing docs into agent-optimized formats.

Peer Review Loop

Combines the Ralph Loop with true cross-model peer review using Codex. Claude builds from specs; Codex reviews the output. Two models disagreeing is a signal. Two models agreeing is confidence.

Use when: Running high-stakes builds where you want the strongest possible review coverage, or understanding how the tier gate review cycle works internally.

Core Methodology

The full DABI lifecycle — the master skill that teaches the specification-driven development methodology and routes to all sub-skills. Covers the Specify Before Building principle and the scientific method applied to AI-assisted development.

Use when: Learning Blueprint from scratch, onboarding a new team member, or needing a comprehensive overview of the entire methodology.

Configuration

All Blueprint settings live in .blueprint/config. Most settings control the Codex integration layer.

Settings Reference

Setting Values Default Purpose
codex_review auto off auto Enable or disable Codex adversarial reviews (design challenge + tier gate)
codex_model model string (Codex default) Which model Codex uses for review calls
tier_gate_mode severity strict permissive off severity How findings gate tier advancement (see Tier Gate)
command_gate all interactive off all Which sessions get command safety gating (see Command Safety Gate)
command_gate_timeout milliseconds 3000 Timeout for Codex safety classification of ambiguous commands
speculative_review on off on Run background review of previous tier while building current tier (see Speculative Review)
speculative_review_timeout seconds 300 Maximum wait for speculative review results before falling back to synchronous

File Structure

Blueprint organizes all project context under the context/ directory. Each subdirectory serves a specific role in the DABI lifecycle:

context/
├── blueprints/               # Domain blueprints (persist across cycles)
│   ├── blueprint-overview.md
│   └── blueprint-{domain}.md
├── sites/                    # Build sites (one per plan)
│   ├── build-site-*.md
│   └── archive/
├── impl/                     # Implementation tracking
│   ├── impl-{domain}.md
│   ├── impl-review-findings.md   # Codex review findings ledger
│   ├── impl-speculative-log.md   # Speculative review timing data
│   ├── loop-log.md
│   └── archive/
└── refs/                     # Reference materials (PRDs, API docs)
    ├── research-brief-{topic}.md   # Synthesized research brief
    └── research-{topic}/           # Raw findings + findings board
DirectoryPurpose
context/blueprints/Domain blueprints with R-numbered requirements and acceptance criteria. These are the source of truth for what to build.
context/sites/Build sites generated by /bp:architect. Each site is a tiered task dependency graph consumed by /bp:build.
context/impl/Implementation tracking documents — living records of what was built, what failed, review findings, and speculative review timing.
context/refs/Reference materials including research briefs produced by /bp:research, PRDs, and API documentation.

The scripts/ directory contains the Codex integration shell scripts:

ScriptPurpose
codex-detect.shCodex binary and plugin detection
codex-config.shBlueprint configuration (all settings)
codex-review.shAdversarial code review invocation
codex-findings.shStructured finding management
codex-gate.shSeverity-based tier gating + fix cycle
codex-design-challenge.shDesign challenge for blueprint drafts
codex-speculative.shBackground speculative review pipeline
command-gate.shPreToolUse command safety gate