From a268b4c1bc314ee3174b3631bdb21fe4599c9595 Mon Sep 17 00:00:00 2001
From: Jonah Schulte <jonah@jonahschulte.com>
Date: Wed, 28 Jan 2026 13:28:37 -0500
Subject: [PATCH] feat: upgrade story-full-pipeline to v4.0 with 6 major
 enhancements
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Upgrade from v3.2.0 to v4.0.0 with improvements inspired by CooperBench research
(Stanford/SAP 2026) on agent coordination failures.

Enhancement 1: Resume Builder (v3.2+)
- Phase 3 RESUMES Builder agent with review findings
- Builder already has full codebase context (50-70% token savings)
- More efficient than spawning fresh Fixer agent

Enhancement 2: Inspector Code Citations (v4.0)
- Inspector must map EVERY task to file:line citations
- Example: "Create component" → "src/Component.tsx:45-67"
- No more "trust me, it works" - requires proof
- Returns structured JSON with code evidence per task
- Prevents vague communication (CooperBench finding)

Enhancement 3: Remove Hospital-Grade Framing (v4.0)
- Dropped psychological appeal language
- Kept rigorous verification gates and bash checks
- Focus on concrete, measurable verification
- Replaced with patterns/verification.md + patterns/tdd.md

Enhancement 4: Micro Stories Get Security Scan (v4.0)
- No longer skip ALL review for micro stories
- Micro now gets 2 reviewers: Security + Architect
- Lightweight but still catches critical vulnerabilities

Enhancement 5: Test Quality Agent + Coverage Gate (v4.0)
- New Test Quality Agent validates:
  - Edge cases covered (null, empty, invalid)
  - Error conditions tested
  - Meaningful assertions (not just "doesn't crash")
  - No flaky tests (random data, timing)
- Automated Coverage Gate enforces 80% threshold
- Builder must fix test gaps before proceeding

Enhancement 6: Playbook Learning System (v4.0)
- Phase 0: Query playbooks before implementation
- Builder gets relevant patterns/gotchas upfront
- Phase 6: Reflection agent extracts learnings
- Auto-generates playbook updates for future agents
- Bootstrap mode: auto-initializes playbooks if missing
- Continuous improvement through reflection

Pipeline: Phase 0 (Playbooks) → Phase 1 (Builder) → Phase 2 (Inspector +
Test Quality + Reviewers parallel) → Phase 2.5 (Coverage Gate) → Phase 3
(Resume Builder) → Phase 4 (Inspector recheck) → Phase 5 (Reconciliation) →
Phase 6 (Reflection)

Files Modified:
- workflow.yaml: v4.0 config with playbooks + quality_gates
- workflow.md: Complete v4.0 documentation with all phases
- agents/builder.md: Playbook awareness + structured JSON
- agents/inspector.md: Code citation requirements + evidence format
- agents/reviewer.md: Remove hospital-grade reference
- agents/architect-integration-reviewer.md: Remove hospital-grade reference
- agents/fixer.md: Remove hospital-grade reference
- README.md: v4.0 documentation + CooperBench analysis

Files Created:
- agents/test-quality.md: Test quality validation agent
- agents/reflection.md: Playbook learning agent
- ../templates/implementation-playbook-template.md: Simple playbook structure

Design Philosophy:
The workflow avoids CooperBench's "curse of coordination" by using:
- Sequential implementation (ONE writer, no merge conflicts)
- Parallel verification (safe read-only validation)
- Context reuse (no expectation failures)
- Evidence-based communication (file:line citations)
- Clear role separation (no overlapping responsibilities)
---
 .../story-full-pipeline/README.md             | 218 ++---
 .../agents/architect-integration-reviewer.md  |   1 -
 .../story-full-pipeline/agents/builder.md     |  77 +-
 .../story-full-pipeline/agents/fixer.md       |   1 -
 .../story-full-pipeline/agents/inspector.md   | 173 ++--
 .../story-full-pipeline/agents/reflection.md  |  93 +++
 .../story-full-pipeline/agents/reviewer.md    |   1 -
 .../agents/test-quality.md                    |  73 ++
 .../story-full-pipeline/workflow.md           | 752 ++++++++++++++++--
 .../story-full-pipeline/workflow.yaml         |  45 +-
 .../implementation-playbook-template.md       |  85 ++
 11 files changed, 1189 insertions(+), 330 deletions(-)
 create mode 100644 src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md
 create mode 100644 src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md
 create mode 100644 src/bmm/workflows/templates/implementation-playbook-template.md

diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/README.md b/src/bmm/workflows/4-implementation/story-full-pipeline/README.md
index a436933f..089a34ee 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/README.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/README.md
@@ -1,124 +1,150 @@
-# Super-Dev Pipeline - GSDMAD Architecture
+# Story-Full-Pipeline v4.0
 
-**Multi-agent pipeline with independent validation and adversarial code review**
+Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes.
 
----
+## What's New in v4.0
 
-## Quick Start
+### 1. Resume Builder (v3.2+)
+**Token Efficiency: 50-70% savings**
 
-```bash
-# Run super-dev pipeline for a story
-/story-full-pipeline story_key=17-10
+- Phase 3 now RESUMES Builder agent with review findings
+- Builder already has full codebase context
+- More efficient than spawning fresh Fixer agent
+
+### 2. Inspector Code Citations (v4.0)
+**Evidence-Based Verification**
+
+- Inspector must map EVERY task to file:line citations
+- Example: "Create component" → "src/Component.tsx:45-67"
+- No more "trust me, it works" - requires proof
+- Returns structured JSON with code evidence per task
+
+### 3. Remove Hospital-Grade Framing (v4.0)
+**Focus on Concrete Verification**
+
+- Dropped psychological appeal language
+- Kept rigorous verification gates and bash checks
+- Replaced with patterns/verification.md + patterns/tdd.md
+
+### 4. Micro Stories Get Security Scan (v4.0)
+**Even Simple Stories Need Security**
+
+- No longer skip ALL review for micro stories
+- Still get 2 reviewers: Security + Architect
+- Lightweight but catches critical vulnerabilities
+
+### 5. Test Quality Agent + Coverage Gate (v4.0)
+**Validate Test Completeness**
+
+- New Test Quality Agent validates:
+  - Edge cases covered (null, empty, invalid)
+  - Error conditions tested
+  - Meaningful assertions (not just "doesn't crash")
+  - No flaky tests (random data, timing)
+- Automated Coverage Gate enforces 80% threshold
+- Builder must fix test gaps before proceeding
+
+### 6. Playbook Learning System (v4.0)
+**Continuous Improvement Through Reflection**
+
+- **Phase 0:** Query playbooks before implementation
+- Builder gets relevant patterns/gotchas upfront
+- **Phase 6:** Reflection agent extracts learnings
+- Auto-generates playbook updates for future agents
+- Bootstrap mode: auto-initializes playbooks if missing
+
+## Pipeline Flow
+
+```
+Phase 0: Playbook Query (orchestrator)
+         ↓
+Phase 1: Builder (initial implementation)
+         ↓
+Phase 2: Inspector + Test Quality + N Reviewers (parallel)
+         ↓
+Phase 2.5: Coverage Gate (automated)
+         ↓
+Phase 3: Resume Builder (fix issues with context)
+         ↓
+Phase 4: Inspector re-check (quick verification)
+         ↓
+Phase 5: Orchestrator reconciliation (evidence-based)
+         ↓
+Phase 6: Playbook Reflection (extract learnings)
 ```
 
----
+## Complexity Routing
 
-## Architecture
+| Complexity | Phase 2 Agents | Total | Reviewers |
+|------------|----------------|-------|-----------|
+| micro | Inspector + Test Quality + 2 | 4 agents | Security + Architect |
+| standard | Inspector + Test Quality + 3 | 5 agents | Security + Logic + Architect |
+| complex | Inspector + Test Quality + 4 | 6 agents | Security + Logic + Architect + Quality |
 
-### Multi-Agent Validation
-- **4 independent agents** working sequentially
-- Builder → Inspector → Reviewer → Fixer
-- Each agent has fresh context
-- No conflict of interest
+## Quality Gates
 
-### Honest Reporting
-- Inspector verifies Builder's work (doesn't trust claims)
-- Reviewer is adversarial (wants to find issues)
-- Main orchestrator does final verification
-- Can't fake completion
+- **Coverage Threshold:** 80% line coverage required
+- **Task Verification:** ALL tasks need file:line evidence
+- **Critical Issues:** MUST fix
+- **High Issues:** MUST fix
 
-### Wave-Based Execution
-- Independent stories run in parallel
-- Dependencies respected via waves
-- 57% faster than sequential execution
+## Token Efficiency
 
----
+- Phase 2 agents spawn in parallel (same cost, faster)
+- Phase 3 resumes Builder (50-70% token savings vs fresh agent)
+- Phase 4 Inspector only (no full re-review)
 
-## Workflow Phases
+## Playbook Configuration
 
-**Phase 1: Builder (Steps 1-4)**
-- Load story, analyze gaps
-- Write tests (TDD)
-- Implement code
-- Report what was built (NO VALIDATION)
+```yaml
+playbooks:
+  enabled: true
+  directory: "docs/playbooks/implementation-playbooks"
+  bootstrap_mode: true  # Auto-initialize if missing
+  max_load: 3
+  auto_apply_updates: false  # Require manual review
+  discovery:
+    enabled: true
+    sources: ["git_history", "docs", "existing_code"]
+```
 
-**Phase 2: Inspector (Steps 5-6)**
-- Fresh context, no Builder knowledge
-- Verify files exist
-- Run tests independently
-- Run quality checks
-- PASS or FAIL verdict
+## How It Avoids CooperBench Coordination Failures
 
-**Phase 3: Reviewer (Step 7)**
-- Fresh context, adversarial stance
-- Find security vulnerabilities
-- Find performance problems
-- Find logic bugs
-- Report issues with severity
+Unlike the multi-agent coordination failures documented in CooperBench (Stanford/SAP 2026):
 
-**Phase 4: Fixer (Steps 8-9)**
-- Fix CRITICAL issues (all)
-- Fix HIGH issues (all)
-- Fix MEDIUM issues (time permitting)
-- Verify fixes independently
+1. **Sequential Implementation** - ONE Builder agent implements entire story (no parallel implementation conflicts)
+2. **Parallel Review** - Multiple agents review in parallel (safe read-only operations)
+3. **Context Reuse** - SAME agent fixes issues (no expectation failures about partner state)
+4. **Evidence-Based** - file:line citations prevent vague communication
+5. **Clear Roles** - Builder writes, reviewers validate (no overlapping responsibilities)
 
-**Phase 5: Final Verification**
-- Main orchestrator verifies all phases
-- Updates story checkboxes
-- Creates commit
-- Marks story complete
-
----
-
-## Key Features
-
-**Separation of Concerns:**
-- Builder focuses only on implementation
-- Inspector focuses only on validation
-- Reviewer focuses only on finding issues
-- Fixer focuses only on resolving issues
-
-**Independent Validation:**
-- Each agent validates the previous agent's work
-- No agent validates its own work
-- Fresh context prevents confirmation bias
-
-**Quality Enforcement:**
-- Multiple quality gates throughout pipeline
-- Can't proceed without passing validation
-- 95% honesty rate (agents can't fake completion)
-
----
+The workflow uses agents for **verification parallelism**, not **implementation parallelism** - avoiding the "curse of coordination."
 
 ## Files
 
-See `workflow.md` for complete architecture details.
-
 **Agent Prompts:**
-- `agents/builder.md` - Implementation agent
-- `agents/inspector.md` - Validation agent
+- `agents/builder.md` - Implementation agent (with playbook awareness)
+- `agents/inspector.md` - Validation agent (requires code citations)
+- `agents/test-quality.md` - Test quality validation (v4.0)
 - `agents/reviewer.md` - Adversarial review agent
-- `agents/fixer.md` - Issue resolution agent
+- `agents/architect-integration-reviewer.md` - Architecture/integration review
+- `agents/fixer.md` - Issue resolution agent (deprecated, uses resume Builder)
+- `agents/reflection.md` - Playbook learning agent (v4.0)
 
 **Workflow Config:**
-- `workflow.yaml` - Main configuration
-- `workflow.md` - Complete documentation
+- `workflow.yaml` - Main configuration (v4.0)
+- `workflow.md` - Complete step-by-step documentation
 
-**Directory Structure:**
-```
-story-full-pipeline/
-├── README.md (this file)
-├── workflow.yaml (configuration)
-├── workflow.md (complete documentation)
-├── agents/
-│   ├── builder.md (implementation agent prompt)
-│   ├── inspector.md (validation agent prompt)
-│   ├── reviewer.md (review agent prompt)
-│   └── fixer.md (fix agent prompt)
-└── steps/
-    └── (step files for each phase)
+**Templates:**
+- `../templates/implementation-playbook-template.md` - Playbook structure
+
+## Usage
+
+```bash
+# Run story-full-pipeline
+/story-full-pipeline story_key=17-10
 ```
 
----
+## Backward Compatibility
 
-**Philosophy:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest.
+Falls back to single-agent mode if multi-agent execution fails.
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md
index 17e099dd..f1cd8442 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md
@@ -5,7 +5,6 @@
 **Trust Level:** HIGH (wants to find integration issues)
 
 <execution_context>
-@patterns/hospital-grade.md
 @patterns/agent-completion.md
 </execution_context>
 
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md
index bcbc8cf5..2131dad6 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md
@@ -5,7 +5,6 @@
 **Trust Level:** LOW (assume will cut corners)
 
 <execution_context>
-@patterns/hospital-grade.md
 @patterns/tdd.md
 @patterns/agent-completion.md
 </execution_context>
@@ -17,11 +16,12 @@
 You are the **BUILDER** agent. Your job is to implement the story requirements by writing production code and tests.
 
 **DO:**
+- **Review playbooks** for gotchas and patterns (if provided)
 - Load and understand the story requirements
 - Analyze what exists vs what's needed
 - Write tests first (TDD approach)
 - Implement production code to make tests pass
-- Follow project patterns and conventions
+- Follow project patterns and playbook guidance
 
 **DO NOT:**
 - Validate your own work (Inspector agent will do this)
@@ -35,7 +35,8 @@ You are the **BUILDER** agent. Your job is to implement the story requirements b
 ## Steps to Execute
 
 ### Step 1: Initialize
-Load story file and cache context:
+Load story file and playbooks (if provided):
+- **Review playbooks first** (if provided in context) - note gotchas and patterns
 - Read story file: `{{story_file}}`
 - Parse all sections (Business Context, Acceptance Criteria, Tasks, etc.)
 - Determine greenfield vs brownfield
@@ -88,54 +89,36 @@ When complete, provide:
 
 ---
 
-## Hospital-Grade Standards
+## Completion Format (v4.0)
 
-⚕️ **Quality >> Speed**
+**Return structured JSON artifact:**
 
-- Take time to do it right
-- Don't skip error handling
-- Don't leave TODO comments
-- Don't use `any` types
-
----
-
-## When Complete, Return This Format
-
-```markdown
-## AGENT COMPLETE
-
-**Agent:** builder
-**Story:** {{story_key}}
-**Status:** SUCCESS | FAILED
-
-### Files Created
-- path/to/new/file1.ts
-- path/to/new/file2.ts
-
-### Files Modified
-- path/to/existing/file.ts
-
-### Tests Added
-- X test files
-- Y test cases total
-
-### Implementation Summary
-Brief description of what was built and key decisions made.
-
-### Known Gaps
-- Any functionality not implemented
-- Any edge cases not handled
-- NONE if all tasks complete
-
-### Ready For
-Inspector validation (next phase)
+```json
+{
+  "agent": "builder",
+  "story_key": "{{story_key}}",
+  "status": "SUCCESS",
+  "files_created": ["path/to/file.tsx", "path/to/file.test.tsx"],
+  "files_modified": ["path/to/existing.tsx"],
+  "tests_added": {
+    "total": 12,
+    "passing": 12
+  },
+  "tasks_addressed": [
+    "Create agreement view component",
+    "Add status badge",
+    "Implement occupant selection"
+  ],
+  "playbooks_reviewed": ["database-patterns.md", "api-security.md"]
+}
 ```
 
-**Why this format?** The orchestrator parses this output to:
-- Verify claimed files actually exist
-- Track what was built for reconciliation
-- Route to next phase appropriately
+**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-builder.json`
 
 ---
 
-**Remember:** You are the BUILDER. Build it well, but don't validate or review your own work. Other agents will do that with fresh eyes.
+**Remember:**
+
+- **Review playbooks first** if provided - they contain gotchas and patterns learned from previous stories
+- Build it well with TDD, but don't validate or review your own work
+- Other agents will verify with fresh eyes and provide file:line evidence
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md
index 968572fd..165c9821 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md
@@ -5,7 +5,6 @@
 **Trust Level:** MEDIUM (incentive to minimize work)
 
 <execution_context>
-@patterns/hospital-grade.md
 @patterns/agent-completion.md
 </execution_context>
 
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md
index 968afb93..141ce651 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md
@@ -1,12 +1,11 @@
-# Inspector Agent - Validation Phase
+# Inspector Agent - Validation Phase with Code Citations
 
-**Role:** Independent verification of Builder's work
+**Role:** Independent verification of Builder's work **WITH EVIDENCE**
 **Steps:** 5-6 (post-validation, quality-checks)
 **Trust Level:** MEDIUM (no conflict of interest)
 
 <execution_context>
 @patterns/verification.md
-@patterns/hospital-grade.md
 @patterns/agent-completion.md
 </execution_context>
 
@@ -14,48 +13,54 @@
 
 ## Your Mission
 
-You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed.
+You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed **and provide file:line evidence for every task**.
 
 **KEY PRINCIPLE: You have NO KNOWLEDGE of what the Builder did. You are starting fresh.**
 
+**CRITICAL REQUIREMENT v4.0: EVERY task must have code citations.**
+
 **DO:**
+- Map EACH task to specific code with file:line citations
 - Verify files actually exist
 - Run tests yourself (don't trust claims)
 - Run quality checks (type-check, lint, build)
-- Give honest PASS/FAIL verdict
+- Provide evidence for EVERY task
 
 **DO NOT:**
-- Take the Builder's word for anything
-- Skip verification steps
+- Skip any task verification
+- Give vague "looks good" without citations
 - Assume tests pass without running them
-- Give PASS verdict if ANY check fails
+- Give PASS verdict if ANY check fails or task lacks evidence
 
 ---
 
 ## Steps to Execute
 
-### Step 5: Post-Validation
+### Step 5: Task Verification with Code Citations
 
-**Verify Implementation Against Story:**
+**Map EVERY task to specific code locations:**
 
-1. **Check Files Exist:**
-   ```bash
-   # For each file mentioned in story tasks
-   ls -la {{file_path}}
-   # FAIL if file missing or empty
-   ```
+1. **Read story file** - understand ALL tasks
 
-2. **Verify File Contents:**
-   - Open each file
-   - Check it has actual code (not just TODO/stub)
-   - Verify it matches story requirements
+2. **For EACH task, provide:**
+   - **file:line** where it's implemented
+   - **Brief quote** of relevant code
+   - **Verdict:** IMPLEMENTED or NOT_IMPLEMENTED
 
-3. **Check Tests Exist:**
-   ```bash
-   # Find test files
-   find . -name "*.test.ts" -o -name "__tests__"
-   # FAIL if no tests found for new code
-   ```
+**Example Evidence Format:**
+
+```
+Task: "Display occupant agreement status"
+Evidence: src/features/agreement/StatusBadge.tsx:45-67
+Code: "const StatusBadge = ({ status }) => ..."
+Verdict: IMPLEMENTED
+```
+
+3. **If task NOT implemented:**
+   - Explain why (file missing, code incomplete, etc.)
+   - Provide file:line where it should be
+
+**CRITICAL:** If you can't cite file:line, mark as NOT_IMPLEMENTED.
 
 ### Step 6: Quality Checks
 
@@ -96,36 +101,49 @@ You are the **INSPECTOR** agent. Your job is to verify that the Builder actually
 
 ---
 
-## Output Requirements
+## Completion Format (v4.0)
 
-**Provide Evidence-Based Verdict:**
+**Return structured JSON with code citations:**
 
-### If PASS:
-```markdown
-✅ VALIDATION PASSED
-
-Evidence:
-- Files verified: [list files checked]
-- Type check: PASS (0 errors)
-- Linter: PASS (0 warnings)
-- Build: PASS
-- Tests: 45/45 passing (95% coverage)
-- Git: 12 files modified, 3 new files
-
-Ready for code review.
+```json
+{
+  "agent": "inspector",
+  "story_key": "{{story_key}}",
+  "verdict": "PASS",
+  "task_verification": [
+    {
+      "task": "Create agreement view component",
+      "implemented": true,
+      "evidence": [
+        {
+          "file": "src/features/agreement/AgreementView.tsx",
+          "lines": "15-67",
+          "code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
+        },
+        {
+          "file": "src/features/agreement/AgreementView.test.tsx",
+          "lines": "8-45",
+          "code_snippet": "describe('AgreementView', () => {...})"
+        }
+      ]
+    },
+    {
+      "task": "Add status badge",
+      "implemented": false,
+      "evidence": [],
+      "reason": "No StatusBadge component found in src/features/agreement/"
+    }
+  ],
+  "checks": {
+    "type_check": {"passed": true, "errors": 0},
+    "lint": {"passed": true, "warnings": 0},
+    "tests": {"passed": true, "total": 12, "passing": 12},
+    "build": {"passed": true}
+  }
+}
 ```
 
-### If FAIL:
-```markdown
-❌ VALIDATION FAILED
-
-Failures:
-1. File missing: app/api/occupant/agreement/route.ts
-2. Type check: 3 errors in lib/api/auth.ts
-3. Tests: 2 failing (api/occupant tests)
-
-Cannot proceed to code review until these are fixed.
-```
+**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-inspector.json`
 
 ---
 
@@ -133,58 +151,15 @@ Cannot proceed to code review until these are fixed.
 
 **Before giving PASS verdict, confirm:**
 
-- [ ] All story files exist and have content
+- [ ] EVERY task has file:line citation or NOT_IMPLEMENTED reason
 - [ ] Type check returns 0 errors
-- [ ] Linter returns 0 errors/warnings
+- [ ] Linter returns 0 warnings
 - [ ] Build succeeds
 - [ ] Tests run and pass (not skipped)
-- [ ] Test coverage >= 90%
-- [ ] Git status is clean or has expected changes
+- [ ] All implemented tasks have code evidence
 
 **If ANY checkbox is unchecked → FAIL verdict**
 
 ---
 
-## Hospital-Grade Standards
-
-⚕️ **Be Thorough**
-
-- Don't skip checks
-- Run tests yourself (don't trust claims)
-- Verify every file exists
-- Give specific evidence
-
----
-
-## When Complete, Return This Format
-
-```markdown
-## AGENT COMPLETE
-
-**Agent:** inspector
-**Story:** {{story_key}}
-**Status:** PASS | FAIL
-
-### Evidence
-- **Type Check:** PASS (0 errors) | FAIL (X errors)
-- **Lint:** PASS (0 warnings) | FAIL (X warnings)
-- **Build:** PASS | FAIL
-- **Tests:** X passing, Y failing, Z% coverage
-
-### Files Verified
-- path/to/file1.ts ✓
-- path/to/file2.ts ✓
-- path/to/missing.ts ✗ (NOT FOUND)
-
-### Failures (if FAIL status)
-1. Specific failure with file:line reference
-2. Another specific failure
-
-### Ready For
-- If PASS: Reviewer (next phase)
-- If FAIL: Builder needs to fix before proceeding
-```
-
----
-
-**Remember:** You are the INSPECTOR. Your job is to find the truth, not rubber-stamp the Builder's work. If something is wrong, say so with evidence.
+**Remember:** You are the INSPECTOR. Your job is to find the truth with evidence, not rubber-stamp the Builder's work. If something is wrong, say so with file:line citations.
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md
new file mode 100644
index 00000000..ebb816d7
--- /dev/null
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md
@@ -0,0 +1,93 @@
+# Reflection Agent - Playbook Learning
+
+You are the **REFLECTION** agent for story {{story_key}}.
+
+## Context
+
+- **Story:** {{story_file}}
+- **Builder initial:** {{builder_artifact}}
+- **All review findings:** {{all_reviewer_artifacts}}
+- **Builder fixes:** {{builder_fixes_artifact}}
+- **Test quality issues:** {{test_quality_artifact}}
+
+## Objective
+
+Identify what future agents should know:
+
+1. **What issues were found?** (from reviewers)
+2. **What did Builder miss initially?** (gaps, edge cases, security)
+3. **What playbook knowledge would have prevented these?**
+4. **Which module/feature area does this apply to?**
+5. **Should we update existing playbook or create new?**
+
+### Key Questions
+
+- What gotchas should future builders know?
+- What code patterns should be standard?
+- What test requirements are essential?
+- What similar stories exist?
+
+## Success Criteria
+
+- [ ] Analyzed review findings
+- [ ] Identified preventable issues
+- [ ] Determined which playbook(s) to update
+- [ ] Return structured proposal
+
+## Completion Format
+
+Return structured JSON artifact:
+
+```json
+{
+  "agent": "reflection",
+  "story_key": "{{story_key}}",
+  "learnings": [
+    {
+      "issue": "SQL injection in query builder",
+      "root_cause": "Builder used string concatenation (didn't know pattern)",
+      "prevention": "Playbook should document: always use parameterized queries",
+      "applies_to": "database queries, API endpoints with user input"
+    },
+    {
+      "issue": "Missing edge case tests for empty arrays",
+      "root_cause": "Test Quality Agent found gap",
+      "prevention": "Playbook should require: test null/empty/invalid for all inputs",
+      "applies_to": "all data processing functions"
+    }
+  ],
+  "playbook_proposal": {
+    "action": "update_existing",
+    "playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
+    "module": "api/database",
+    "updates": {
+      "common_gotchas": [
+        "Never concatenate user input into SQL - use parameterized queries",
+        "Test edge cases: null, undefined, [], '', invalid input"
+      ],
+      "code_patterns": [
+        "db.query(sql, [param1, param2]) ✓",
+        "sql + userInput ✗"
+      ],
+      "test_requirements": [
+        "Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
+        "Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
+      ],
+      "related_stories": ["{{story_key}}"]
+    }
+  }
+}
+```
+
+Save to: `docs/sprint-artifacts/completions/{{story_key}}-reflection.json`
+
+## Playbook Structure
+
+When proposing playbook updates, structure them with these sections:
+
+1. **Common Gotchas** - What mistakes to avoid
+2. **Code Patterns** - Standard approaches (with ✓ and ✗ examples)
+3. **Test Requirements** - What tests are essential
+4. **Related Stories** - Which stories used these patterns
+
+Keep it simple and actionable for future agents.
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md
index 2a711e05..f857bfaa 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md
@@ -6,7 +6,6 @@
 
 <execution_context>
 @patterns/security-checklist.md
-@patterns/hospital-grade.md
 @patterns/agent-completion.md
 </execution_context>
 
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md
new file mode 100644
index 00000000..172ff9f6
--- /dev/null
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md
@@ -0,0 +1,73 @@
+# Test Quality Agent
+
+You are the **TEST QUALITY** agent for story {{story_key}}.
+
+## Context
+
+- **Story:** {{story_file}}
+- **Builder completion:** {{builder_completion_artifact}}
+
+## Objective
+
+Review test files for quality and completeness:
+
+1. Find all test files created/modified by Builder
+2. For each test file, verify:
+   - **Happy path**: Primary functionality tested ✓
+   - **Edge cases**: null, empty, invalid inputs ✓
+   - **Error conditions**: Failures handled properly ✓
+   - **Assertions**: Meaningful checks (not just "doesn't crash")
+   - **Test names**: Descriptive and clear
+   - **Deterministic**: No random data, no timing dependencies
+3. Check that tests actually validate the feature
+
+**Focus on:** What's missing? What edge cases weren't considered?
+
+## Success Criteria
+
+- [ ] All test files reviewed
+- [ ] Edge cases identified (covered or missing)
+- [ ] Error conditions verified
+- [ ] Assertions are meaningful
+- [ ] Tests are deterministic
+- [ ] Return quality assessment
+
+## Completion Format
+
+Return structured JSON artifact:
+
+```json
+{
+  "agent": "test_quality",
+  "story_key": "{{story_key}}",
+  "verdict": "PASS" | "NEEDS_IMPROVEMENT",
+  "test_files_reviewed": ["path/to/test.tsx", ...],
+  "issues": [
+    {
+      "severity": "HIGH",
+      "file": "path/to/test.tsx:45",
+      "issue": "Missing edge case: empty input array",
+      "recommendation": "Add test: expect(fn([])).toThrow(...)"
+    },
+    {
+      "severity": "MEDIUM",
+      "file": "path/to/test.tsx:67",
+      "issue": "Test uses Math.random() - could be flaky",
+      "recommendation": "Use fixed test data"
+    }
+  ],
+  "coverage_analysis": {
+    "edge_cases_covered": true,
+    "error_conditions_tested": true,
+    "meaningful_assertions": true,
+    "tests_are_deterministic": true
+  },
+  "summary": {
+    "high_issues": 0,
+    "medium_issues": 0,
+    "low_issues": 0
+  }
+}
+```
+
+Save to: `docs/sprint-artifacts/completions/{{story_key}}-test-quality.json`
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md
index 3f603f0e..49a1ad91 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md
@@ -1,74 +1,142 @@
-# Super-Dev-Pipeline v3.1 - Token-Efficient Multi-Agent Pipeline
+# Story-Full-Pipeline v4.0 - Enhanced Multi-Agent Pipeline
 
 <purpose>
 Implement a story using parallel verification agents with Builder context reuse.
-Each agent has single responsibility. Builder fixes issues in its own context (50-70% token savings).
-Orchestrator handles bookkeeping (story file updates, verification).
+Enhanced with playbook learning, code citation evidence, test quality validation, and automated coverage gates.
+Builder fixes issues in its own context (50-70% token savings).
 </purpose>
 
 <philosophy>
-**Token-Efficient Multi-Agent Pipeline**
+**Quality Through Discipline, Continuous Learning**
 
-- Builder implements (creative, context preserved)
-- Inspector + Reviewers validate in parallel (verification, fresh context)
-- Builder fixes issues (creative, reuses context - 50-70% token savings)
-- Inspector re-checks (verification, quick check)
-- Orchestrator reconciles story file (mechanical)
+- Playbook Query: Load relevant patterns before starting
+- Builder: Implements with playbook knowledge (context preserved)
+- Inspector + Test Quality + Reviewers: Validate in parallel with proof
+- Coverage Gate: Automated threshold enforcement
+- Builder: Fixes issues in same context (50-70% token savings)
+- Inspector: Quick recheck
+- Orchestrator: Reconciles mechanically
+- Reflection: Updates playbooks for future agents
 
-**Key Innovation:** Resume Builder instead of spawning fresh Fixer.
-Builder already knows the codebase - just needs to fix specific issues.
-
-Trust but verify. Fresh context for verification. Reuse context for fixes.
+Trust but verify. Fresh context for verification. Evidence-based validation. Self-improving system.
 </philosophy>
 
 <config>
 name: story-full-pipeline
-version: 3.2.0
+version: 4.0.0
 execution_mode: multi_agent
 
 phases:
+  phase_0: Playbook Query (orchestrator)
   phase_1: Builder (saves agent_id)
-  phase_2: [Inspector + N Reviewers] in parallel (N = 2/3/4 based on complexity)
+  phase_2: [Inspector + Test Quality + N Reviewers] in parallel
+  phase_2.5: Coverage Gate (automated)
   phase_3: Resume Builder with all findings (reuses context)
   phase_4: Inspector re-check (quick verification)
   phase_5: Orchestrator reconciliation
+  phase_6: Playbook Reflection
 
 reviewer_counts:
-  micro: 2 reviewers (security, architect/integration) v3.2.0+
-  standard: 3 reviewers (security, logic/performance, architect/integration) v3.2.0+
-  complex: 4 reviewers (security, logic, architect/integration, code quality) v3.2.0+
+  micro: 2 reviewers (security, architect/integration)
+  standard: 3 reviewers (security, logic/performance, architect/integration)
+  complex: 4 reviewers (security, logic, architect/integration, code quality)
+
+quality_gates:
+  coverage_threshold: 80  # % line coverage required
+  task_verification: "all_with_evidence"  # Inspector must cite file:line
+  critical_issues: "must_fix"
+  high_issues: "must_fix"
 
 token_efficiency:
   - Phase 2 agents spawn in parallel (same cost, faster)
-  - Phase 3 resumes Builder (50-70% token savings vs fresh Fixer)
+  - Phase 3 resumes Builder (50-70% token savings vs fresh agent)
   - Phase 4 Inspector only (no full re-review)
+
+playbooks:
+  enabled: true
+  directory: "docs/playbooks/implementation-playbooks"
+  max_load: 3
+  auto_apply_updates: false
 </config>
 
 <execution_context>
-@patterns/hospital-grade.md
+@patterns/verification.md
+@patterns/tdd.md
 @patterns/agent-completion.md
 </execution_context>
 
 <process>
 
 <step name="load_story" priority="first">
-Load and validate the story file.
+**Load and parse story file**
 
 \`\`\`bash
 STORY_FILE="docs/sprint-artifacts/{{story_key}}.md"
 [ -f "$STORY_FILE" ] || { echo "ERROR: Story file not found"; exit 1; }
 \`\`\`
 
-Use Read tool on the story file. Parse:
-- Complexity level (micro/standard/complex)
+Use Read tool. Extract:
 - Task count
 - Acceptance criteria count
+- Keywords for risk scoring
 
-Determine which agents to spawn based on complexity routing.
+**Determine complexity:**
+\`\`\`bash
+TASK_COUNT=$(grep -c "^- \[ \]" "$STORY_FILE")
+RISK_KEYWORDS=$(grep -ciE "auth|security|payment|encryption|migration|database" "$STORY_FILE")
+
+if [ "$TASK_COUNT" -le 3 ] && [ "$RISK_KEYWORDS" -eq 0 ]; then
+  COMPLEXITY="micro"
+  REVIEWER_COUNT=2
+elif [ "$TASK_COUNT" -ge 16 ] || [ "$RISK_KEYWORDS" -gt 0 ]; then
+  COMPLEXITY="complex"
+  REVIEWER_COUNT=4
+else
+  COMPLEXITY="standard"
+  REVIEWER_COUNT=3
+fi
+\`\`\`
+
+Determine agents to spawn: Inspector + Test Quality + $REVIEWER_COUNT Reviewers
+</step>
+
+<step name="query_playbooks">
+**Phase 0: Playbook Query**
+
+\`\`\`
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+📚 PHASE 0: PLAYBOOK QUERY
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+\`\`\`
+
+**Extract story keywords:**
+\`\`\`bash
+STORY_KEYWORDS=$(grep -E "^## Story Title|^### Feature|^## Business Context" "$STORY_FILE" | sed 's/[#]//g' | tr '\n' ' ')
+echo "Story keywords: $STORY_KEYWORDS"
+\`\`\`
+
+**Search for relevant playbooks:**
+Use Grep tool:
+- Pattern: extracted keywords
+- Path: \`docs/playbooks/implementation-playbooks/\`
+- Output mode: files_with_matches
+- Limit: 3 files
+
+**Load matching playbooks:**
+For each playbook found:
+- Use Read tool
+- Extract sections: Common Gotchas, Code Patterns, Test Requirements
+
+If no playbooks exist:
+\`\`\`
+ℹ️  No playbooks found - this will be the first story to create them
+\`\`\`
+
+Store playbook content for Builder.
 </step>
 
 <step name="spawn_builder">
-**Phase 1: Builder Agent (Steps 1-4)**
+**Phase 1: Builder Agent**
 
 \`\`\`
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
@@ -76,41 +144,359 @@ Determine which agents to spawn based on complexity routing.
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 \`\`\`
 
-Spawn Builder agent and save agent_id for later resume.
-
-**CRITICAL: Save Builder's agent_id for later resume**
+Spawn Builder agent and **SAVE agent_id for resume later**:
 
 \`\`\`
-BUILDER_AGENT_ID={{agent_id_from_task_result}}
-echo "Builder agent: $BUILDER_AGENT_ID"
+BUILDER_TASK = Task({
+  subagent_type: "general-purpose",
+  description: "Implement story {{story_key}}",
+  prompt: \`
+You are the BUILDER agent for story {{story_key}}.
+
+<execution_context>
+@patterns/tdd.md
+@patterns/agent-completion.md
+</execution_context>
+
+<context>
+Story: [inline story file content]
+
+{{IF playbooks loaded}}
+Relevant Playbooks (review before implementing):
+[inline playbook content]
+
+Pay special attention to:
+- Common Gotchas in these playbooks
+- Code Patterns to follow
+- Test Requirements to satisfy
+{{ENDIF}}
+</context>
+
+<objective>
+Implement the story requirements:
+1. Review story tasks and acceptance criteria
+2. **Review playbooks** for gotchas and patterns (if provided)
+3. Analyze what exists vs needed (gap analysis)
+4. **Write tests FIRST** (TDD - tests before implementation)
+5. Implement production code to pass tests
+</objective>
+
+<constraints>
+- DO NOT validate your own work
+- DO NOT review your code
+- DO NOT update story checkboxes
+- DO NOT commit changes yet
+</constraints>
+
+<success_criteria>
+- [ ] Reviewed playbooks for guidance
+- [ ] Tests written for all requirements
+- [ ] Production code implements tests
+- [ ] Tests pass
+- [ ] Return structured completion artifact
+</success_criteria>
+
+<completion_format>
+Return structured JSON artifact:
+{
+  "agent": "builder",
+  "story_key": "{{story_key}}",
+  "status": "SUCCESS" | "FAILED",
+  "files_created": ["path/to/file.tsx", ...],
+  "files_modified": ["path/to/file.tsx", ...],
+  "tests_added": {
+    "total": 12,
+    "passing": 12
+  },
+  "tasks_addressed": ["task description from story", ...]
+}
+
+Save to: docs/sprint-artifacts/completions/{{story_key}}-builder.json
+</completion_format>
+\`
+})
+
+BUILDER_AGENT_ID = {{extract agent_id from Task result}}
 \`\`\`
 
-Wait for completion. Parse structured output. Verify files exist.
+**CRITICAL: Store Builder agent ID:**
+\`\`\`bash
+echo "Builder agent ID: $BUILDER_AGENT_ID"
+echo "$BUILDER_AGENT_ID" > /tmp/builder-agent-id.txt
+\`\`\`
+
+**Wait for completion. Verify artifact exists:**
+\`\`\`bash
+BUILDER_COMPLETION="docs/sprint-artifacts/completions/{{story_key}}-builder.json"
+[ -f "$BUILDER_COMPLETION" ] || { echo "❌ No builder artifact"; exit 1; }
+\`\`\`
+
+**Verify files exist:**
+\`\`\`bash
+# For each file in files_created and files_modified:
+[ -f "$file" ] || echo "❌ MISSING: $file"
+\`\`\`
 
 If files missing or status FAILED: halt pipeline.
 </step>
 
 <step name="spawn_verification_parallel">
-**Phase 2: Parallel Verification (Inspector + Reviewers)**
+**Phase 2: Parallel Verification (Inspector + Test Quality + Reviewers)**
 
 \`\`\`
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 🔍 PHASE 2: PARALLEL VERIFICATION
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Spawning: Inspector + Test Quality + {{REVIEWER_COUNT}} Reviewers
+Total agents: {{2 + REVIEWER_COUNT}}
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 \`\`\`
 
-**CRITICAL: Spawn ALL verification agents in ONE message (parallel execution)**
+**CRITICAL: Spawn ALL agents in ONE message (parallel execution)**
+
+Send single message with multiple Task calls:
+1. Inspector Agent
+2. Test Quality Agent
+3. Security Reviewer
+4. Logic/Performance Reviewer (if standard/complex)
+5. Architect/Integration Reviewer
+6. Code Quality Reviewer (if complex)
+
+---
+
+## Inspector Agent Prompt:
 
-Determine reviewer count based on complexity:
 \`\`\`
-if complexity == "micro": REVIEWER_COUNT = 1
-if complexity == "standard": REVIEWER_COUNT = 2
-if complexity == "complex": REVIEWER_COUNT = 3
+Task({
+  subagent_type: "general-purpose",
+  description: "Validate story {{story_key}} implementation",
+  prompt: \`
+You are the INSPECTOR agent for story {{story_key}}.
+
+<execution_context>
+@patterns/verification.md
+@patterns/agent-completion.md
+</execution_context>
+
+<context>
+Story: [inline story file content]
+</context>
+
+<objective>
+Independently verify implementation WITH CODE CITATIONS:
+
+1. Read story file - understand ALL tasks
+2. Read each file Builder created/modified
+3. **Map EACH task to specific code with file:line citations**
+4. Run verification checks:
+   - Type-check (0 errors required)
+   - Lint (0 warnings required)
+   - Tests (all passing required)
+   - Build (success required)
+</objective>
+
+<critical_requirement>
+**EVERY task must have evidence.**
+
+For each task, provide:
+- file:line where it's implemented
+- Brief quote of relevant code
+- Verdict: IMPLEMENTED or NOT_IMPLEMENTED
+
+Example:
+Task: "Display occupant agreement status"
+Evidence: src/features/agreement/StatusBadge.tsx:45-67
+Code: "const StatusBadge = ({ status }) => ..."
+Verdict: IMPLEMENTED
+</critical_requirement>
+
+<constraints>
+- You have NO KNOWLEDGE of what Builder did
+- Run all checks yourself - don't trust claims
+- **Every task needs file:line citation**
+- If code doesn't exist: mark NOT IMPLEMENTED with reason
+</constraints>
+
+<success_criteria>
+- [ ] ALL tasks mapped to code locations
+- [ ] Type check: 0 errors
+- [ ] Lint: 0 warnings
+- [ ] Tests: all passing
+- [ ] Build: success
+- [ ] Return structured evidence
+</success_criteria>
+
+<completion_format>
+{
+  "agent": "inspector",
+  "story_key": "{{story_key}}",
+  "verdict": "PASS" | "FAIL",
+  "task_verification": [
+    {
+      "task": "Create agreement view component",
+      "implemented": true,
+      "evidence": [
+        {
+          "file": "src/features/agreement/AgreementView.tsx",
+          "lines": "15-67",
+          "code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
+        },
+        {
+          "file": "src/features/agreement/AgreementView.test.tsx",
+          "lines": "8-45",
+          "code_snippet": "describe('AgreementView', () => {...})"
+        }
+      ]
+    },
+    {
+      "task": "Add status badge",
+      "implemented": false,
+      "evidence": [],
+      "reason": "No StatusBadge component found in src/features/agreement/"
+    }
+  ],
+  "checks": {
+    "type_check": {"passed": true, "errors": 0},
+    "lint": {"passed": true, "warnings": 0},
+    "tests": {"passed": true, "total": 12, "passing": 12},
+    "build": {"passed": true}
+  }
+}
+
+Save to: docs/sprint-artifacts/completions/{{story_key}}-inspector.json
+</completion_format>
+\`
+})
 \`\`\`
 
-Spawn Inspector + N Reviewers in single message. Wait for ALL agents to complete. Collect findings.
+---
 
-Aggregate all findings from Inspector + Reviewers.
+## Test Quality Agent Prompt:
+
+\`\`\`
+Task({
+  subagent_type: "general-purpose",
+  description: "Review test quality for {{story_key}}",
+  prompt: \`
+You are the TEST QUALITY agent for story {{story_key}}.
+
+<context>
+Story: [inline story file content]
+Builder completion: [inline builder artifact]
+</context>
+
+<objective>
+Review test files for quality and completeness:
+
+1. Find all test files created/modified by Builder
+2. For each test file, verify:
+   - **Happy path**: Primary functionality tested ✓
+   - **Edge cases**: null, empty, invalid inputs ✓
+   - **Error conditions**: Failures handled properly ✓
+   - **Assertions**: Meaningful checks (not just "doesn't crash")
+   - **Test names**: Descriptive and clear
+   - **Deterministic**: No random data, no timing dependencies
+3. Check that tests actually validate the feature
+
+**Focus on:** What's missing? What edge cases weren't considered?
+</objective>
+
+<success_criteria>
+- [ ] All test files reviewed
+- [ ] Edge cases identified (covered or missing)
+- [ ] Error conditions verified
+- [ ] Assertions are meaningful
+- [ ] Tests are deterministic
+- [ ] Return quality assessment
+</success_criteria>
+
+<completion_format>
+{
+  "agent": "test_quality",
+  "story_key": "{{story_key}}",
+  "verdict": "PASS" | "NEEDS_IMPROVEMENT",
+  "test_files_reviewed": ["path/to/test.tsx", ...],
+  "issues": [
+    {
+      "severity": "HIGH",
+      "file": "path/to/test.tsx:45",
+      "issue": "Missing edge case: empty input array",
+      "recommendation": "Add test: expect(fn([])).toThrow(...)"
+    },
+    {
+      "severity": "MEDIUM",
+      "file": "path/to/test.tsx:67",
+      "issue": "Test uses Math.random() - could be flaky",
+      "recommendation": "Use fixed test data"
+    }
+  ],
+  "coverage_analysis": {
+    "edge_cases_covered": true | false,
+    "error_conditions_tested": true | false,
+    "meaningful_assertions": true | false,
+    "tests_are_deterministic": true | false
+  },
+  "summary": {
+    "high_issues": 1,
+    "medium_issues": 2,
+    "low_issues": 0
+  }
+}
+
+Save to: docs/sprint-artifacts/completions/{{story_key}}-test-quality.json
+</completion_format>
+\`
+})
+\`\`\`
+
+---
+
+(Continue with Security, Logic, Architect, Quality reviewers as before...)
+
+**Wait for ALL agents to complete.**
+
+Collect completion artifacts:
+- \`inspector.json\`
+- \`test-quality.json\`
+- \`reviewer-security.json\`
+- \`reviewer-logic.json\` (if spawned)
+- \`reviewer-architect.json\`
+- \`reviewer-quality.json\` (if spawned)
+
+Parse all findings and aggregate by severity.
+</step>
+
+<step name="coverage_gate">
+**Phase 2.5: Coverage Gate (Automated)**
+
+\`\`\`
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+📊 PHASE 2.5: COVERAGE GATE
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+\`\`\`
+
+Run coverage check:
+\`\`\`bash
+# Run tests with coverage
+npm test -- --coverage --silent 2>&1 | tee coverage-output.txt
+
+# Extract coverage percentage (adjust grep pattern for your test framework)
+COVERAGE=$(grep -E "All files|Statements" coverage-output.txt | head -1 | grep -oE "[0-9]+\.[0-9]+|[0-9]+" | head -1 || echo "0")
+
+echo "Coverage: ${COVERAGE}%"
+echo "Threshold: {{coverage_threshold}}%"
+
+# Compare coverage
+if (( $(echo "$COVERAGE < {{coverage_threshold}}" | bc -l) )); then
+  echo "❌ Coverage ${COVERAGE}% below threshold {{coverage_threshold}}%"
+  echo "Builder must add more tests before proceeding"
+  exit 1
+fi
+
+echo "✅ Coverage gate passed: ${COVERAGE}%"
+\`\`\`
+
+If coverage fails: add to issues list for Builder to fix.
 </step>
 
 <step name="resume_builder_with_findings">
@@ -156,68 +542,274 @@ If PASS: Proceed to reconciliation.
 
 \`\`\`
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-🔧 PHASE 5: RECONCILIATION (Orchestrator)
+📊 PHASE 5: RECONCILIATION (Orchestrator)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 \`\`\`
 
 **YOU (orchestrator) do this directly. No agent spawn.**
 
-1. Get what was built (git log, git diff)
-2. Read story file
-3. Check off completed tasks (Edit tool)
-4. Fill Dev Agent Record with pipeline details
-5. Verify updates (grep task checkboxes)
-6. Update sprint-status.yaml to "done"
+**5.1: Load completion artifacts**
+\`\`\`bash
+BUILDER_FIXES="docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json"
+INSPECTOR="docs/sprint-artifacts/completions/{{story_key}}-inspector.json"
+\`\`\`
+
+Use Read tool on all artifacts.
+
+**5.2: Read story file**
+Use Read tool: \`docs/sprint-artifacts/{{story_key}}.md\`
+
+**5.3: Check off completed tasks using Inspector evidence**
+
+For each task in \`inspector.task_verification\`:
+- If \`implemented: true\` and has evidence:
+  - Use Edit tool: \`"- [ ] {{task}}"\` → \`"- [x] {{task}}"\`
+
+**5.4: Fill Dev Agent Record with evidence**
+
+Use Edit tool:
+\`\`\`markdown
+### Dev Agent Record
+**Implementation Date:** {{timestamp}}
+**Agent Model:** Claude Sonnet 4.5 (multi-agent pipeline v4.0)
+**Git Commit:** {{git_commit}}
+
+**Pipeline Phases:**
+- Phase 0: Playbook Query ({{playbooks_loaded}} loaded)
+- Phase 1: Builder (initial implementation)
+- Phase 2: Parallel Verification
+  - Inspector: {{verdict}} with code citations
+  - Test Quality: {{verdict}}
+  - {{REVIEWER_COUNT}} Reviewers: {{issues_found}}
+- Phase 2.5: Coverage Gate ({{coverage}}%)
+- Phase 3: Builder (resumed, fixed {{fixes_count}} issues)
+- Phase 4: Inspector re-check ({{verdict}})
+
+**Files Created:** {{count}}
+**Files Modified:** {{count}}
+**Tests:** {{tests.passing}}/{{tests.total}} passing ({{coverage}}%)
+**Issues Fixed:** {{critical}} CRITICAL, {{high}} HIGH, {{medium}} MEDIUM
+
+**Task Evidence:** (Inspector code citations)
+{{for each task with evidence}}
+- [x] {{task}}
+  - {{evidence[0].file}}:{{evidence[0].lines}}
+{{endfor}}
+\`\`\`
+
+**5.5: Verify updates**
+\`\`\`bash
+CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
+[ "$CHECKED" -gt 0 ] || { echo "❌ Zero tasks checked"; exit 1; }
+echo "✅ Reconciled: $CHECKED tasks with evidence"
+\`\`\`
 </step>
 
 <step name="final_verification">
 **Final Quality Gate**
 
-Verify:
-1. Git commit exists
-2. Story tasks checked (count > 0)
-3. Dev Agent Record filled
-4. Sprint status updated
+\`\`\`bash
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "🔍 FINAL VERIFICATION"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 
-If verification fails: fix using Edit, then re-verify.
+# 1. Git commit exists
+git log --oneline -3 | grep "{{story_key}}" || { echo "❌ No commit"; exit 1; }
+echo "✅ Git commit found"
+
+# 2. Story tasks checked with evidence
+CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
+[ "$CHECKED" -gt 0 ] || { echo "❌ No tasks checked"; exit 1; }
+echo "✅ $CHECKED tasks checked with code citations"
+
+# 3. Dev Agent Record filled
+grep -A 5 "### Dev Agent Record" docs/sprint-artifacts/{{story_key}}.md | grep -q "202" || { echo "❌ Record not filled"; exit 1; }
+echo "✅ Dev Agent Record filled"
+
+# 4. Coverage met threshold
+FINAL_COVERAGE=$(jq -r '.tests.coverage' docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json)
+if (( $(echo "$FINAL_COVERAGE < {{coverage_threshold}}" | bc -l) )); then
+  echo "❌ Coverage ${FINAL_COVERAGE}% still below threshold"
+  exit 1
+fi
+echo "✅ Coverage: ${FINAL_COVERAGE}%"
+
+echo ""
+echo "✅ STORY COMPLETE"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+\`\`\`
+
+**Update sprint-status.yaml:**
+Use Edit tool: \`"{{story_key}}: ready-for-dev"\` → \`"{{story_key}}: done"\`
+</step>
+
+<step name="playbook_reflection">
+**Phase 6: Playbook Reflection**
+
+\`\`\`
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+💡 PHASE 6: PLAYBOOK REFLECTION
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+\`\`\`
+
+Spawn Reflection Agent:
+
+\`\`\`
+Task({
+  subagent_type: "general-purpose",
+  description: "Extract learnings from {{story_key}}",
+  prompt: \`
+You are the REFLECTION agent for story {{story_key}}.
+
+<context>
+Story: [inline story file]
+Builder initial: [inline builder.json]
+All review findings: [inline all reviewer artifacts]
+Builder fixes: [inline builder-fixes.json]
+Test quality issues: [inline test-quality.json]
+</context>
+
+<objective>
+Identify what future agents should know:
+
+1. **What issues were found?** (from reviewers)
+2. **What did Builder miss initially?** (gaps, edge cases, security)
+3. **What playbook knowledge would have prevented these?**
+4. **Which module/feature area does this apply to?**
+5. **Should we update existing playbook or create new?**
+
+Questions:
+- What gotchas should future builders know?
+- What code patterns should be standard?
+- What test requirements are essential?
+- What similar stories exist?
+</objective>
+
+<success_criteria>
+- [ ] Analyzed review findings
+- [ ] Identified preventable issues
+- [ ] Determined which playbook(s) to update
+- [ ] Return structured proposal
+</success_criteria>
+
+<completion_format>
+{
+  "agent": "reflection",
+  "story_key": "{{story_key}}",
+  "learnings": [
+    {
+      "issue": "SQL injection in query builder",
+      "root_cause": "Builder used string concatenation (didn't know pattern)",
+      "prevention": "Playbook should document: always use parameterized queries",
+      "applies_to": "database queries, API endpoints with user input"
+    },
+    {
+      "issue": "Missing edge case tests for empty arrays",
+      "root_cause": "Test Quality Agent found gap",
+      "prevention": "Playbook should require: test null/empty/invalid for all inputs",
+      "applies_to": "all data processing functions"
+    }
+  ],
+  "playbook_proposal": {
+    "action": "update_existing" | "create_new",
+    "playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
+    "module": "api/database",
+    "updates": {
+      "common_gotchas": [
+        "Never concatenate user input into SQL - use parameterized queries",
+        "Test edge cases: null, undefined, [], '', invalid input"
+      ],
+      "code_patterns": [
+        "db.query(sql, [param1, param2]) ✓",
+        "sql + userInput ✗"
+      ],
+      "test_requirements": [
+        "Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
+        "Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
+      ],
+      "related_stories": ["{{story_key}}"]
+    }
+  }
+}
+
+Save to: docs/sprint-artifacts/completions/{{story_key}}-reflection.json
+</completion_format>
+\`
+})
+\`\`\`
+
+**Wait for completion.**
+
+**Review playbook proposal:**
+\`\`\`bash
+REFLECTION="docs/sprint-artifacts/completions/{{story_key}}-reflection.json"
+ACTION=$(jq -r '.playbook_proposal.action' "$REFLECTION")
+PLAYBOOK=$(jq -r '.playbook_proposal.playbook' "$REFLECTION")
+
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "📝 Playbook Update Proposal"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "Action: $ACTION"
+echo "Playbook: $PLAYBOOK"
+echo ""
+jq -r '.learnings[] | "- \(.issue)\n  Prevention: \(.prevention)"' "$REFLECTION"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+\`\`\`
+
+If \`auto_apply_updates: true\` in config:
+- Read playbook (or create from template if new)
+- Use Edit tool to add learnings to sections
+- Commit playbook update
+
+If \`auto_apply_updates: false\` (default):
+- Display proposal for manual review
+- User can apply later with \`/update-playbooks {{story_key}}\`
 </step>
 
 </process>
 
 <failure_handling>
-**Builder fails:** Don't spawn verification. Report failure and halt.
-**Inspector fails (Phase 2):** Still run Reviewers in parallel, collect all findings together.
-**Inspector fails (Phase 4):** Resume Builder again with new issues (iterative fix loop).
-**Builder resume fails:** Report unfixed issues. Manual intervention needed.
-**Reconciliation fails:** Fix using Edit tool. Re-verify checkboxes.
+**Builder fails (Phase 1):** Don't spawn verification. Report failure and halt.
+**Inspector fails (Phase 2):** Still collect other reviewer findings.
+**Test Quality fails:** Add issues to Builder fix list.
+**Coverage below threshold:** Add to Builder fix list.
+**Reviewers find CRITICAL:** Builder MUST fix when resumed.
+**Inspector fails (Phase 4):** Resume Builder again (iterative loop, max 3 iterations).
+**Builder resume fails:** Report unfixed issues. Manual intervention.
+**Reconciliation fails:** Fix with Edit tool, re-verify.
 </failure_handling>
 
 <complexity_routing>
-| Complexity | Pipeline | Reviewers | Total Phase 2 Agents |
-|------------|----------|-----------|---------------------|
-| micro | Builder → [Inspector + 2 Reviewers] → Resume Builder → Inspector recheck | 2 (security, architect) | 3 agents |
-| standard | Builder → [Inspector + 3 Reviewers] → Resume Builder → Inspector recheck | 3 (security, logic, architect) | 4 agents |
-| complex | Builder → [Inspector + 4 Reviewers] → Resume Builder → Inspector recheck | 4 (security, logic, architect, quality) | 5 agents |
+| Complexity | Phase 2 Agents | Total | Security |
+|------------|----------------|-------|----------|
+| micro | Inspector + Test Quality + 2 Reviewers | 4 agents | Security Reviewer + Architect |
+| standard | Inspector + Test Quality + 3 Reviewers | 5 agents | Security + Logic + Architect |
+| complex | Inspector + Test Quality + 4 Reviewers | 6 agents | Security + Logic + Architect + Quality |
 
-**Key Improvements (v3.2.0):**
-- All verification agents spawn in parallel (single message, faster execution)
-- Builder resume in Phase 3 saves 50-70% tokens vs spawning fresh Fixer
-- **NEW:** Architect/Integration Reviewer catches runtime issues (404s, pattern violations, missing migrations)
-
-**Reviewer Specializations:**
-- **Security:** Auth, injection, secrets, cross-tenant access
-- **Logic/Performance:** Bugs, edge cases, N+1 queries, race conditions
-- **Architect/Integration:** Routes work, patterns match, migrations applied, dependencies installed (v3.2.0+)
-- **Code Quality:** Maintainability, naming, duplication (complex only)
+**All verification agents spawn in parallel (single message)**
 </complexity_routing>
 
 <success_criteria>
-- [ ] Builder spawned and agent_id saved
-- [ ] All verification agents completed in parallel
-- [ ] Builder resumed with consolidated findings
-- [ ] Inspector recheck passed
-- [ ] Git commit exists for story
-- [ ] Story file has checked tasks (count > 0)
-- [ ] Dev Agent Record filled with all phases
-- [ ] Sprint status updated to "done"
+- [ ] Phase 0: Playbooks loaded (if available)
+- [ ] Phase 1: Builder spawned, agent_id saved
+- [ ] Phase 2: All verification agents completed in parallel
+- [ ] Phase 2.5: Coverage gate passed
+- [ ] Phase 3: Builder resumed with consolidated findings
+- [ ] Phase 4: Inspector recheck passed
+- [ ] Phase 5: Orchestrator reconciled with Inspector evidence
+- [ ] Phase 6: Playbook reflection completed
+- [ ] Git commit exists
+- [ ] Story tasks checked with code citations
+- [ ] Dev Agent Record filled
+- [ ] Coverage ≥ {{coverage_threshold}}%
+- [ ] Sprint status: done
 </success_criteria>
+
+<improvements_v4>
+1. ✅ Resume Builder for fixes (v3.2+) - 50-70% token savings
+2. ✅ Inspector provides code citations (v4.0) - file:line evidence for every task
+3. ✅ Removed "hospital-grade" framing (v4.0) - kept disciplined gates
+4. ✅ Micro stories get 2 reviewers + security scan (v3.2+) - not zero
+5. ✅ Test Quality Agent (v4.0) + Coverage Gate (v4.0) - validates test quality and enforces threshold
+6. ✅ Playbook query (v4.0) before Builder + reflection (v4.0) after - continuous learning
+</improvements_v4>
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml
index 47b73c77..3a075a5e 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml
@@ -1,7 +1,7 @@
 name: story-full-pipeline
-description: "Multi-agent pipeline with wave-based execution, independent validation, and adversarial code review (GSDMAD)"
-author: "BMAD Method + GSD"
-version: "3.2.0" # Added architect-integration-reviewer for runtime verification
+description: "Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes"
+author: "BMAD Method"
+version: "4.0.0" # Added playbook learning, test quality, coverage gates, Inspector code citations
 
 # Execution mode
 execution_mode: "multi_agent" # multi_agent | single_agent (fallback)
@@ -37,13 +37,23 @@ agents:
     timeout: 3600 # 1 hour
 
   inspector:
-    description: "Validation agent - independent verification"
+    description: "Validation agent - independent verification with code citations"
     steps: [5, 6]
     subagent_type: "general-purpose"
     prompt_file: "{agents_path}/inspector.md"
     fresh_context: true # No knowledge of builder agent
     trust_level: "medium" # No conflict of interest
     timeout: 1800 # 30 minutes
+    require_code_citations: true # v4.0: Must provide file:line evidence for all tasks
+
+  test_quality:
+    description: "Test quality validation - verifies test coverage and quality"
+    steps: [5.5]
+    subagent_type: "general-purpose"
+    prompt_file: "{agents_path}/test-quality.md"
+    fresh_context: true
+    trust_level: "medium"
+    timeout: 1200 # 20 minutes
 
   reviewer:
     description: "Adversarial code review - finds problems"
@@ -73,15 +83,40 @@ agents:
     trust_level: "medium" # Incentive to minimize work
     timeout: 2400 # 40 minutes
 
+  reflection:
+    description: "Playbook learning - extracts patterns for future agents"
+    steps: [10]
+    subagent_type: "general-purpose"
+    prompt_file: "{agents_path}/reflection.md"
+    timeout: 900 # 15 minutes
+
 # Reconciliation: orchestrator does this directly (see workflow.md Phase 5)
 
+# Playbook configuration (v4.0)
+playbooks:
+  enabled: true # Set to false in project config to disable
+  directory: "docs/playbooks/implementation-playbooks"
+  bootstrap_mode: true # Auto-initialize if missing
+  max_load: 3
+  auto_apply_updates: false # Require manual review of playbook updates
+  discovery:
+    enabled: true # Scan git/docs to populate initial playbooks
+    sources: ["git_history", "docs", "existing_code"]
+
+# Quality gates (v4.0)
+quality_gates:
+  coverage_threshold: 80 # % line coverage required
+  task_verification: "all_with_evidence" # Inspector must provide file:line citations
+  critical_issues: "must_fix"
+  high_issues: "must_fix"
+
 # Complexity level (determines which steps to execute)
 complexity_level: "standard" # micro | standard | complex
 
 # Complexity routing
 complexity_routing:
   micro:
-    skip_agents: ["reviewer"] # Skip code review for micro stories
+    skip_agents: [] # Full pipeline (v4.0: micro gets security scan)
     description: "Lightweight path for low-risk stories"
     examples: ["UI tweaks", "text changes", "simple CRUD"]
 
diff --git a/src/bmm/workflows/templates/implementation-playbook-template.md b/src/bmm/workflows/templates/implementation-playbook-template.md
new file mode 100644
index 00000000..79208ebf
--- /dev/null
+++ b/src/bmm/workflows/templates/implementation-playbook-template.md
@@ -0,0 +1,85 @@
+# {{Module/Feature Area}} - Implementation Playbook
+
+> **Purpose:** Guide future agents implementing features in {{module_name}}
+> **Created:** {{date}}
+> **Last Updated:** {{date}}
+
+## Common Gotchas
+
+**What mistakes to avoid:**
+
+- Add specific gotchas here as they're discovered
+- Example: "Never concatenate user input into SQL queries"
+- Example: "Always validate file paths before operations"
+
+## Code Patterns
+
+**Standard approaches that work:**
+
+### Pattern: {{Pattern Name}}
+
+✓ **Good:**
+```
+// Example of correct pattern
+db.query(sql, [param1, param2])
+```
+
+✗ **Bad:**
+```
+// Example of incorrect pattern
+sql + userInput
+```
+
+### Pattern: {{Another Pattern}}
+
+✓ **Good:**
+```
+// Another example
+if (!data) return null;
+```
+
+✗ **Bad:**
+```
+// Don't do this
+data.map(...)  // crashes if data is null
+```
+
+## Test Requirements
+
+**Essential tests for this module:**
+
+- **Happy path:** Verify primary functionality
+- **Edge cases:** Test null, undefined, empty arrays, invalid inputs
+- **Error conditions:** Verify errors are handled properly
+- **Security:** Test for injection attacks, auth bypasses, etc.
+
+### Example Test Pattern
+
+```typescript
+describe('FeatureName', () => {
+  it('handles happy path', () => {
+    expect(fn(validInput)).toEqual(expected)
+  })
+
+  it('handles edge cases', () => {
+    expect(fn(null)).toThrow()
+    expect(fn([])).toEqual([])
+  })
+
+  it('validates security', () => {
+    expect(fn("' OR 1=1--")).toThrow()
+  })
+})
+```
+
+## Related Stories
+
+Stories that used these patterns:
+
+- {{story_key}} - {{brief description}}
+
+## Notes
+
+- Keep this simple and actionable
+- Add new learnings as they emerge
+- Focus on preventable mistakes