Blueprint Documentation
Specification-Driven Development for AI Coding Agents
Overview
Blueprint is a Claude Code plugin for specification-driven development. Instead of prompting an AI agent and hoping for the best, Blueprint introduces a specification layer between your intent and the code. You describe what you want. The system decomposes it into domain blueprints with numbered requirements and testable acceptance criteria. Then it builds from those blueprints — not from memory, not from vibes — in an automated loop that validates every step.
Blueprint is for developers who use AI coding agents and want reliable, traceable results. The blueprints are the source of truth. Agents read them, build from them, and validate against them. When something breaks, the system traces the failure back to the blueprint — not the code.
The DABI Lifecycle
Blueprint follows four phases — each driven by a slash command inside Claude Code:
| Phase | Command | What It Does | What It Produces |
|---|---|---|---|
| Draft | /bp:draft |
Decompose requirements into domain blueprints | Blueprints with R-numbered requirements |
| Architect | /bp:architect |
Break into tasks, map dependencies, organize into tiers | Tiered build site + dependency graph |
| Build | /bp:build |
Auto-parallel build with validation at every step | Working software, committed tier by tier |
| Inspect | /bp:inspect |
Gap analysis + peer review against blueprints | Findings report traced to specs |
An optional Research phase grounds the design in real evidence before blueprints are written. For a deeper explanation, see the DABI Lifecycle section under Methodology.
Quick Start
Two paths depending on whether you are starting fresh or adding Blueprint to an existing codebase.
Greenfield
Starting a new project from scratch. Blueprint decomposes your idea into blueprints, plans the build order, then implements everything automatically.
> /bp:draft
# Blueprint asks what you're building
What are you building?
> A REST API for task management. Users, projects, tasks with priorities
and due dates, assignments. PostgreSQL.
# Blueprint decomposes into domain blueprints with numbered requirements
Created 4 blueprints (22 requirements, 69 acceptance criteria)
Next: /bp:architect
> /bp:architect
# Reads blueprints, breaks into tasks, maps dependencies into tiers
Generated build site: 34 tasks, 5 tiers
Next: /bp:build
> /bp:build
# The Ralph Loop — implements, validates, commits, repeats
Loop activated — 34 tasks, 20 max iterations.
...
All tasks done. Build passes. Tests pass.
BLUEPRINT COMPLETE — 34 tasks in 18 iterations.
Every line of code traces to a requirement. Every requirement has acceptance criteria. See /bp:draft, /bp:architect, and /bp:build for full command details.
Brownfield
Adding Blueprint to an existing codebase. Use --from-code to reverse-engineer blueprints from your code, then scope the build to the gaps.
> /bp:draft --from-code
# Blueprint explores your codebase and reverse-engineers specs
Exploring codebase... Next.js 14, Prisma, NextAuth.
Created 6 blueprints — 4 requirements are gaps (not yet implemented).
> /bp:architect --filter collaboration
# Only plan the subset you want to build now
Generated build site: 8 tasks, 3 tiers
> /bp:build
# Builds only the filtered tasks
Loop activated — 8 tasks.
...
BLUEPRINT COMPLETE — 8 tasks in 8 iterations.
The --from-code flag tells Blueprint to read your existing code and produce specs that match it, while highlighting gaps. The --filter flag on /bp:architect scopes the build to a specific domain.
Commands
All Blueprint commands are Claude Code slash commands. They follow the DABI lifecycle phases, plus utility commands for inspection and maintenance.
/bp:draft Draft
Decompose requirements into domain blueprints with numbered requirements and testable acceptance criteria. Each blueprint is stack-independent and human-readable.
> /bp:draft
What are you building?
> A REST API for task management with users, projects, and tasks.
Created 4 blueprints (22 requirements, 69 acceptance criteria)
When the project would benefit from it, the draft phase offers to run deep research before design Q&A. After the internal reviewer approves, blueprints are sent to Codex for a design challenge.
| Flag | Description |
|---|---|
--from-code | Reverse-engineer blueprints from an existing codebase and identify gaps |
Related: /bp:architect, /bp:research
/bp:architect Architect
Read all blueprints, break requirements into tasks, map dependencies, and organize everything into a tiered build site. Tier 0 has no dependencies, Tier 1 depends only on Tier 0, and so on.
> /bp:architect
Generated build site: 34 tasks, 5 tiers
Next: /bp:build
| Flag | Description |
|---|---|
--filter <domain> | Scope the build site to a specific domain or subset of blueprints |
/bp:build Build
The Ralph Loop. Automatically implements tasks from the build site, validates each against acceptance criteria, commits on pass, diagnoses and fixes on fail. Parallelizes independent tasks across subagents and progresses through tiers autonomously.
> /bp:build
Loop activated — 34 tasks, 20 max iterations.
═══ Wave 1 ═══
3 task(s) ready:
T-001: Database schema (tier 0, deps: none)
T-002: Auth middleware (tier 0, deps: none)
T-003: Config loader (tier 0, deps: none)
Dispatching 2 grouped subagents...
All 3 tasks complete. Merging...
═══ Wave 2 ═══
2 task(s) ready...
...
═══ BUILD COMPLETE ═══
Waves: 2 | Tasks: 5/5
At every tier boundary, Codex adversarial review gates advancement — P0/P1 findings must be fixed before the next tier starts. Circuit breakers prevent infinite loops: 3 test failures marks a task BLOCKED.
Related: /bp:architect, /bp:progress, /bp:inspect
/bp:inspect Inspect
Gap analysis compares what was built against what was specified. Peer review checks for bugs, security issues, and missed requirements. Everything is traced back to blueprint requirements.
> /bp:inspect
Gap analysis: 22/22 requirements covered
Peer review: 1 finding (P2 — input validation on PATCH endpoint)
Traced to: blueprint-api.md R14
Related: /bp:gap-analysis, /bp:build
/bp:research Research
Dispatch 2-8 parallel subagents to explore the codebase and search the web for current best practices, library landscape, reference implementations, and common pitfalls. A synthesizer agent cross-validates findings and produces a research brief.
> /bp:research "build a Verse compiler targeting WASM"
Dispatching 4 research agents...
Synthesizing findings...
Research brief saved to context/refs/research-brief-verse-wasm.md
Research is also offered inline during /bp:draft when the project involves unfamiliar technology or architectural decisions with multiple viable approaches.
Related: /bp:draft
/bp:progress Utility
Check build site progress — tasks done, in progress, blocked, and remaining.
> /bp:progress
Build site: build-site-api.md
Done: 28/34 | In progress: 2 | Blocked: 1 | Remaining: 3
Related: /bp:build, /bp:gap-analysis
/bp:gap-analysis Utility
Compare what was built against what was intended. Lists requirements that are fully covered, partially covered, or missing implementation.
> /bp:gap-analysis
Scanning 4 blueprints against codebase...
Covered: 20/22 | Partial: 1 | Missing: 1
R14 (input validation): partial — PATCH endpoint missing
R18 (rate limiting): missing
Related: /bp:inspect, /bp:revise
/bp:revise Utility
Trace manual fixes back into blueprints. When you fix a bug or change code by hand, /bp:revise updates the blueprints and context files so the specification stays in sync with reality.
> /bp:revise
Detected 3 manual changes since last build.
Updated blueprint-api.md: R14 acceptance criteria refined.
Updated impl-api.md: manual fix logged.
Related: /bp:draft, /bp:gap-analysis
/bp:codex-review Utility
Run a standalone Codex adversarial review on the current diff. Useful outside of the build loop when you want a second-model review on demand.
> /bp:codex-review
Reviewing diff against worktree base...
2 findings: 1 P1 (nil pointer in error path), 1 P3 (unused import)
Requires Codex to be installed. See Codex Integration for details.
Related: Tier Gate, Design Challenge
/bp:help Utility
Show the Blueprint usage guide — lists all commands, their phases, and a brief description of the DABI workflow.
> /bp:help
Blueprint v2.1.0 — Specification-Driven Development
Commands:
/bp:draft Draft blueprints from requirements
/bp:architect Generate tiered build site
/bp:build Auto-parallel build loop
/bp:inspect Gap analysis + peer review
...
Methodology
Blueprint is built on a simple observation: LLMs are non-deterministic, but software engineering doesn't have to be. By applying structured methodology, we extract reliable outcomes from a stochastic process.
The DABI Lifecycle
Every Blueprint project follows four phases. Each phase has a clear input, a clear output, and a validation gate before the next phase begins.
Draft — define the what. You describe what you're building in natural language. Blueprint decomposes it into domain blueprints — structured documents with numbered requirements (R1, R2, ...) and testable acceptance criteria. Each blueprint is stack-independent and human-readable. An optional research phase can ground the design in real evidence before blueprints are written. After the internal reviewer approves, blueprints are sent to Codex for a design challenge that catches decomposition flaws, missing requirements, and ambiguous criteria.
Architect — plan the order. Reads all blueprints, breaks requirements into tasks, maps dependencies, and organizes everything into a tiered build site — a dependency graph where Tier 0 has no dependencies, Tier 1 depends only on Tier 0, and so on. This is what the build loop consumes.
Build — run the loop. The Ralph Loop. Each iteration reads the build site, finds the next unblocked task, loads the relevant blueprint and acceptance criteria, implements the task, then validates (build + tests + acceptance criteria). Pass means commit, mark done, move to the next task. Fail means diagnose, fix, revalidate. At every tier boundary, Codex adversarial review gates advancement. The loop runs until all tasks are done or the iteration limit is reached.
Inspect — verify the result. Gap analysis compares what was built against what was specified. Peer review checks for bugs, security issues, and missed requirements. Everything is traced back to blueprint requirements so you know exactly where any issue originated.
Blueprints as Source of Truth
Most AI coding tools treat the agent as a black box — you prompt, it generates, you hope. Blueprint inverts this. The specification is the product. The code is a derivative. When the spec is clear, the code follows. When the code is wrong, the spec tells you why.
This matters because AI agents are getting better every month, but the fundamental problem remains: without a specification, there is nothing to validate against. Blueprint gives every agent — current and future — a contract to build from and a standard to meet.
Blueprints persist across build cycles. They are not ephemeral prompts — they are versioned, structured documents that drive every downstream decision. When a bug is found, /bp:revise traces it back to the blueprint and updates the specification so the iteration loop can reproduce the fix automatically.
With Codex adversarial review, Blueprint goes further: a second model with different training and different blind spots reviews both the specification and the implementation. Two models disagreeing is a signal. Two models agreeing is confidence.
Scientific Method Applied
Blueprint applies the scientific method — hypothesize, test, observe, refine — to extract reliable outcomes from a stochastic process. Each concept in the methodology maps directly to a step in the scientific method:
| Scientific Method | Blueprint Concept | Role |
|---|---|---|
| Hypothesis | Blueprints | What you expect the software to do — numbered requirements with testable criteria |
| Controlled conditions | Validation gates | Build, tests, and acceptance criteria that must pass before advancing |
| Repeated trials | Convergence loops | The Ralph Loop — iterate until stable, with circuit breakers to prevent infinite loops |
| Lab notebook | Implementation tracking | What was tried, what worked, what failed — living records in context/impl/ |
| Update the hypothesis | Revision | Trace bugs back to blueprints via /bp:revise and fix at the source |
The key insight is that non-determinism is not a problem if you have a validation loop. A single LLM pass produces a rough draft. But repeated passes with clear acceptance criteria converge toward correctness — the same way repeated experiments converge toward truth.
The plugin ships with 8 specialized agents, a multi-agent research system, and 13 deep-dive skills covering the full methodology. When Codex is installed, the system operates as a dual-model architecture — Claude builds and Codex reviews — catching classes of errors that single-model self-review cannot detect.
Codex Integration
Blueprint uses Codex (OpenAI's coding agent) as an adversarial reviewer — a second model with a fundamentally different perspective that catches blind spots Claude cannot see in its own output. This dual-model approach operates at three levels: design, build, and command safety.
Design Challenge
What it does: An adversarial review of your blueprints focused exclusively on architecture-level concerns. A second model challenges the design before any code is written.
When it triggers: Automatically after Claude drafts blueprints and the internal reviewer approves them during /bp:draft.
How it works: The entire blueprint set is sent to Codex for a design challenge. Codex checks for domain decomposition quality, missing requirements, ambiguous acceptance criteria, implicit assumptions, and cross-domain coherence.
# Flow: draft → internal review → Codex challenge → user review
Claude drafts blueprints
→ Blueprint reviewer approves
→ Codex challenges the design
→ User reviews blueprints + findings
Codex returns structured findings categorized as critical (must fix before building) or advisory (worth considering). Critical findings trigger an auto-fix loop — Claude addresses them, Codex re-challenges, up to 2 cycles. Advisory findings are presented alongside blueprints at the user review gate.
The design challenge is purpose-built to prohibit implementation feedback. No framework suggestions, no file path opinions — only design-level concerns that would cause real problems during the build phase.
Configuration: Controlled by the codex_review setting. Set to off to skip design challenges.
Tier Gate
What it does: A severity-based code review gate that blocks tier advancement when critical defects are found. Codex reviews the diff of every completed tier during /bp:build.
When it triggers: Automatically at every tier boundary during the build loop — after all tasks in a tier complete and before the next tier starts.
How it works: Codex reviews the combined diff of all tasks completed in the tier. Each finding is classified by severity:
| Severity | Behavior |
|---|---|
P0 (critical) | Blocks tier advancement. Fix task generated automatically. |
P1 (high) | Blocks tier advancement. Fix task generated automatically. |
P2 (medium) | Deferred. Logged but does not block. |
P3 (low) | Deferred. Logged but does not block. |
═══ Tier 0 Complete ═══
Codex reviews diff (T-001, T-002, T-003) ...
Review: 2 findings (1 P0, 1 P3)
Gate: BLOCKED → fix cycle 1/2
Fixing P0: nil pointer in auth middleware ...
Re-review ...
Gate: PROCEED
═══ Tier 1 starting ═══
The review-fix cycle runs up to 2 iterations per tier. After that, the build advances with a warning — the system never deadlocks.
Gate modes are configurable via the tier_gate_mode setting:
| Mode | Behavior |
|---|---|
severity (default) | P0/P1 findings block advancement; P2/P3 are logged only |
strict | All findings block advancement |
permissive | Nothing blocks — all findings are logged only |
off | Tier gate disabled entirely |
Speculative Review
What it does: Runs the Codex tier review in the background while Claude builds the current tier, eliminating gate latency.
When it triggers: Automatically when a tier completes and speculative_review is on (the default).
How it works: When a tier finishes, Blueprint starts the Codex review in a background process. Meanwhile, Claude begins building the next tier immediately. When that next tier finishes and the gate checks for the previous tier's review, the results are already available — cutting tier gate latency to near-zero.
# Speculative review timeline
Tier 0 complete ──────────────────────► Tier 1 complete
│ │
└── Codex reviews Tier 0 (background) ──►│
│
Results ready ◄──────┘
before gate runs
If the background review is not done when the gate runs, the system waits up to the configured timeout (speculative_review_timeout, default 300 seconds) and then falls back to a synchronous review.
Configuration: Toggle with speculative_review (on/off). Adjust wait time with speculative_review_timeout.
Command Safety Gate
What it does: A PreToolUse hook that intercepts every Bash command before execution and classifies its safety, preventing destructive operations.
When it triggers: On every Bash command execution during a Blueprint session (configurable via command_gate).
How it works: Commands are checked through a three-stage pipeline:
# Command safety pipeline
Agent runs bash command
│
▼
Fast-path check ──► allowlist (50+ safe commands) → approve
│ └► blocklist (rm -rf, force push, ...) → block
│
▼ (ambiguous)
Codex classifies ──► safe → approve
│ └► warn → approve + log
│ └► block → prevent execution
│
▼ (cached)
Verdict cache ──► normalized pattern match → reuse verdict
The gate integrates with Claude Code's permission system — commands already allowed or blocked in settings bypass the gate entirely. Verdicts are cached by normalized command pattern within the session to avoid redundant API calls.
Configuration: Set command_gate to all (default), interactive, or off. Adjust Codex classification timeout with command_gate_timeout (default 3000ms).
Graceful Degradation
What it does: Ensures Blueprint works fully without Codex installed. All Codex features are additive — they enhance the system but are never required.
When it triggers: Automatically when Codex is not detected on the system.
How it works: When Codex is not installed, the following behavior changes apply:
| Feature | Behavior Without Codex |
|---|---|
| Design Challenge | Skipped — the internal blueprint reviewer still runs |
| Tier Gate | Skipped — the build loop proceeds without review pauses |
| Command Gate | Falls back to static allowlist/blocklist only (no Codex classification) |
| Install Nudge | A one-time tip appears: Tip: Install Codex for adversarial code review |
Blueprint works the same as before Codex integration was added. Codex makes it harder to ship bad blueprints and bad code, but the core DABI lifecycle is fully functional without it.
Skills Reference
Blueprint ships with 13 deep-dive skills covering the full methodology. Each skill is a focused knowledge module that teaches a specific aspect of specification-driven development.
Blueprint Writing
How to write blueprints that AI agents can consume effectively. Covers implementation-agnostic blueprint design, testable acceptance criteria, and R-numbered requirement structure.
Use when: Starting a new project or adding a major feature. Any time you need to define what to build before building it.
Convergence Monitoring
Detecting when agent iterations are converging toward a stable solution or hitting a ceiling. Covers convergence signals, ceiling detection, and non-convergence remediation.
Use when: The build loop is running many iterations without completing, or you suspect the agent is stuck in a fix-break cycle.
Peer Review
Six modes for using a second AI agent or model to challenge the primary builder agent's work. Covers Diff Critique, Design Challenge, Threshold Review, and more.
Use when: You want cross-model review of code or specifications, or need to set up adversarial review patterns beyond the built-in Codex integration.
Validation-First Design
Every requirement must be automatically verifiable. Covers the 6-gate validation pipeline, phase gates, and how to write acceptance criteria that agents can test.
Use when: Writing acceptance criteria for blueprints, designing test strategies, or ensuring that every requirement has a clear pass/fail condition.
Context Architecture
Progressive disclosure architecture for organizing project context as a DAG (directed acyclic graph). Agents enter at the root and traverse only the subgraph they need.
Use when: Setting up a project's context/ directory, organizing blueprints and implementation tracking, or debugging context loading issues.
Revision
Tracing bugs and manual fixes back to blueprints and prompts, then fixing at the source so the iteration loop can reproduce the fix automatically.
Use when: You manually fixed a bug and want to update the blueprint so the same fix would be produced by the build loop in a clean run.
Brownfield Adoption
Step-by-step process for adopting Blueprint on an existing codebase. Covers the 6-step brownfield process, bootstrap prompt design, and spec validation against existing code.
Use when: Adding Blueprint to a project that already has code. Use /bp:draft --from-code to reverse-engineer specifications from existing implementations.
Speculative Pipeline
A pipeline execution strategy where downstream stages start before upstream stages finish, using staggered timing with configurable delays. The leader begins work while the reviewer catches up.
Use when: Optimizing build throughput, understanding how speculative review works, or designing custom overlapping phase pipelines.
Prompt Pipeline
How to design the numbered prompt pipeline that drives DABI phases. Covers greenfield 3-prompt patterns, rewrite 6-9 prompt patterns, and shared prompt architecture.
Use when: Customizing the prompts that drive each Blueprint phase, or understanding how the internal prompt chain produces blueprints and build plans.
Implementation Tracking
Living records of what was built, what is pending, what failed, and what dead ends were explored. The lab notebook of the build process.
Use when: Reviewing build progress, understanding why a task was marked BLOCKED, or resuming a build after interruption.
Documentation Inversion
Inverts the traditional documentation flow from code-to-wiki-for-humans (which rots) into code-to-CLAUDE.md-to-skills-for-agents (which stays current).
Use when: Writing documentation that AI agents will consume, or converting existing human-facing docs into agent-optimized formats.
Peer Review Loop
Combines the Ralph Loop with true cross-model peer review using Codex. Claude builds from specs; Codex reviews the output. Two models disagreeing is a signal. Two models agreeing is confidence.
Use when: Running high-stakes builds where you want the strongest possible review coverage, or understanding how the tier gate review cycle works internally.
Core Methodology
The full DABI lifecycle — the master skill that teaches the specification-driven development methodology and routes to all sub-skills. Covers the Specify Before Building principle and the scientific method applied to AI-assisted development.
Use when: Learning Blueprint from scratch, onboarding a new team member, or needing a comprehensive overview of the entire methodology.
Configuration
All Blueprint settings live in .blueprint/config. Most settings control the Codex integration layer.
Settings Reference
| Setting | Values | Default | Purpose |
|---|---|---|---|
codex_review |
auto off |
auto |
Enable or disable Codex adversarial reviews (design challenge + tier gate) |
codex_model |
model string | (Codex default) | Which model Codex uses for review calls |
tier_gate_mode |
severity strict permissive off |
severity |
How findings gate tier advancement (see Tier Gate) |
command_gate |
all interactive off |
all |
Which sessions get command safety gating (see Command Safety Gate) |
command_gate_timeout |
milliseconds | 3000 |
Timeout for Codex safety classification of ambiguous commands |
speculative_review |
on off |
on |
Run background review of previous tier while building current tier (see Speculative Review) |
speculative_review_timeout |
seconds | 300 |
Maximum wait for speculative review results before falling back to synchronous |
File Structure
Blueprint organizes all project context under the context/ directory. Each subdirectory serves a specific role in the DABI lifecycle:
context/
├── blueprints/ # Domain blueprints (persist across cycles)
│ ├── blueprint-overview.md
│ └── blueprint-{domain}.md
├── sites/ # Build sites (one per plan)
│ ├── build-site-*.md
│ └── archive/
├── impl/ # Implementation tracking
│ ├── impl-{domain}.md
│ ├── impl-review-findings.md # Codex review findings ledger
│ ├── impl-speculative-log.md # Speculative review timing data
│ ├── loop-log.md
│ └── archive/
└── refs/ # Reference materials (PRDs, API docs)
├── research-brief-{topic}.md # Synthesized research brief
└── research-{topic}/ # Raw findings + findings board
| Directory | Purpose |
|---|---|
context/blueprints/ | Domain blueprints with R-numbered requirements and acceptance criteria. These are the source of truth for what to build. |
context/sites/ | Build sites generated by /bp:architect. Each site is a tiered task dependency graph consumed by /bp:build. |
context/impl/ | Implementation tracking documents — living records of what was built, what failed, review findings, and speculative review timing. |
context/refs/ | Reference materials including research briefs produced by /bp:research, PRDs, and API documentation. |
The scripts/ directory contains the Codex integration shell scripts:
| Script | Purpose |
|---|---|
codex-detect.sh | Codex binary and plugin detection |
codex-config.sh | Blueprint configuration (all settings) |
codex-review.sh | Adversarial code review invocation |
codex-findings.sh | Structured finding management |
codex-gate.sh | Severity-based tier gating + fix cycle |
codex-design-challenge.sh | Design challenge for blueprint drafts |
codex-speculative.sh | Background speculative review pipeline |
command-gate.sh | PreToolUse command safety gate |