feat: upgrade story-full-pipeline to v4.0 with 6 major enhancements

Upgrade from v3.2.0 to v4.0.0 with improvements inspired by CooperBench research (Stanford/SAP 2026) on agent coordination failures. Enhancement 1: Resume Builder (v3.2+) - Phase 3 RESUMES Builder agent with review findings - Builder already has full codebase context (50-70% token savings) - More efficient than spawning fresh Fixer agent Enhancement 2: Inspector Code Citations (v4.0) - Inspector must map EVERY task to file:line citations - Example: "Create component" → "src/Component.tsx:45-67" - No more "trust me, it works" - requires proof - Returns structured JSON with code evidence per task - Prevents vague communication (CooperBench finding) Enhancement 3: Remove Hospital-Grade Framing (v4.0) - Dropped psychological appeal language - Kept rigorous verification gates and bash checks - Focus on concrete, measurable verification - Replaced with patterns/verification.md + patterns/tdd.md Enhancement 4: Micro Stories Get Security Scan (v4.0) - No longer skip ALL review for micro stories - Micro now gets 2 reviewers: Security + Architect - Lightweight but still catches critical vulnerabilities Enhancement 5: Test Quality Agent + Coverage Gate (v4.0) - New Test Quality Agent validates: - Edge cases covered (null, empty, invalid) - Error conditions tested - Meaningful assertions (not just "doesn't crash") - No flaky tests (random data, timing) - Automated Coverage Gate enforces 80% threshold - Builder must fix test gaps before proceeding Enhancement 6: Playbook Learning System (v4.0) - Phase 0: Query playbooks before implementation - Builder gets relevant patterns/gotchas upfront - Phase 6: Reflection agent extracts learnings - Auto-generates playbook updates for future agents - Bootstrap mode: auto-initializes playbooks if missing - Continuous improvement through reflection Pipeline: Phase 0 (Playbooks) → Phase 1 (Builder) → Phase 2 (Inspector + Test Quality + Reviewers parallel) → Phase 2.5 (Coverage Gate) → Phase 3 (Resume Builder) → Phase 4 (Inspector recheck) → Phase 5 (Reconciliation) → Phase 6 (Reflection) Files Modified: - workflow.yaml: v4.0 config with playbooks + quality_gates - workflow.md: Complete v4.0 documentation with all phases - agents/builder.md: Playbook awareness + structured JSON - agents/inspector.md: Code citation requirements + evidence format - agents/reviewer.md: Remove hospital-grade reference - agents/architect-integration-reviewer.md: Remove hospital-grade reference - agents/fixer.md: Remove hospital-grade reference - README.md: v4.0 documentation + CooperBench analysis Files Created: - agents/test-quality.md: Test quality validation agent - agents/reflection.md: Playbook learning agent - ../templates/implementation-playbook-template.md: Simple playbook structure Design Philosophy: The workflow avoids CooperBench's "curse of coordination" by using: - Sequential implementation (ONE writer, no merge conflicts) - Parallel verification (safe read-only validation) - Context reuse (no expectation failures) - Evidence-based communication (file:line citations) - Clear role separation (no overlapping responsibilities)
2026-01-28 13:28:37 -05:00 · 2026-01-28 13:28:37 -05:00 · a268b4c1bc
parent 0810646ed6
commit a268b4c1bc
11 changed files with 1189 additions and 330 deletions
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/README.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/README.md
@ -1,124 +1,150 @@
-# Super-Dev Pipeline - GSDMAD Architecture
+# Story-Full-Pipeline v4.0
-**Multi-agent pipeline with independent validation and adversarial code review**
+Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes.
---
+## What's New in v4.0
-## Quick Start
+### 1. Resume Builder (v3.2+)
 **Token Efficiency: 50-70% savings**
-```bash
+- Phase 3 now RESUMES Builder agent with review findings
-# Run super-dev pipeline for a story
+- Builder already has full codebase context
-/story-full-pipeline story_key=17-10
+- More efficient than spawning fresh Fixer agent
 ### 2. Inspector Code Citations (v4.0)
 **Evidence-Based Verification**
 - Inspector must map EVERY task to file:line citations
 - Example: "Create component" → "src/Component.tsx:45-67"
 - No more "trust me, it works" - requires proof
 - Returns structured JSON with code evidence per task
 ### 3. Remove Hospital-Grade Framing (v4.0)
 **Focus on Concrete Verification**
 - Dropped psychological appeal language
 - Kept rigorous verification gates and bash checks
 - Replaced with patterns/verification.md + patterns/tdd.md
 ### 4. Micro Stories Get Security Scan (v4.0)
 **Even Simple Stories Need Security**
 - No longer skip ALL review for micro stories
 - Still get 2 reviewers: Security + Architect
 - Lightweight but catches critical vulnerabilities
 ### 5. Test Quality Agent + Coverage Gate (v4.0)
 **Validate Test Completeness**
 - New Test Quality Agent validates:
  - Edge cases covered (null, empty, invalid)
  - Error conditions tested
  - Meaningful assertions (not just "doesn't crash")
  - No flaky tests (random data, timing)
 - Automated Coverage Gate enforces 80% threshold
 - Builder must fix test gaps before proceeding
 ### 6. Playbook Learning System (v4.0)
 **Continuous Improvement Through Reflection**
 - **Phase 0:** Query playbooks before implementation
 - Builder gets relevant patterns/gotchas upfront
 - **Phase 6:** Reflection agent extracts learnings
 - Auto-generates playbook updates for future agents
 - Bootstrap mode: auto-initializes playbooks if missing
 ## Pipeline Flow
 ```
 Phase 0: Playbook Query (orchestrator)
         ↓
 Phase 1: Builder (initial implementation)
         ↓
 Phase 2: Inspector + Test Quality + N Reviewers (parallel)
         ↓
 Phase 2.5: Coverage Gate (automated)
         ↓
 Phase 3: Resume Builder (fix issues with context)
         ↓
 Phase 4: Inspector re-check (quick verification)
         ↓
 Phase 5: Orchestrator reconciliation (evidence-based)
         ↓
 Phase 6: Playbook Reflection (extract learnings)
 ```
---
+## Complexity Routing
-## Architecture
+| Complexity | Phase 2 Agents | Total | Reviewers |
 |------------|----------------|-------|-----------|
 | micro | Inspector + Test Quality + 2 | 4 agents | Security + Architect |
 | standard | Inspector + Test Quality + 3 | 5 agents | Security + Logic + Architect |
 | complex | Inspector + Test Quality + 4 | 6 agents | Security + Logic + Architect + Quality |
-### Multi-Agent Validation
+## Quality Gates
 - **4 independent agents** working sequentially
 - Builder → Inspector → Reviewer → Fixer
 - Each agent has fresh context
 - No conflict of interest
-### Honest Reporting
+- **Coverage Threshold:** 80% line coverage required
- Inspector verifies Builder's work (doesn't trust claims)
+- **Task Verification:** ALL tasks need file:line evidence
- Reviewer is adversarial (wants to find issues)
+- **Critical Issues:** MUST fix
- Main orchestrator does final verification
+- **High Issues:** MUST fix
 - Can't fake completion
-### Wave-Based Execution
+## Token Efficiency
 - Independent stories run in parallel
 - Dependencies respected via waves
 - 57% faster than sequential execution
---
+- Phase 2 agents spawn in parallel (same cost, faster)
 - Phase 3 resumes Builder (50-70% token savings vs fresh agent)
 - Phase 4 Inspector only (no full re-review)
-## Workflow Phases
+## Playbook Configuration
-**Phase 1: Builder (Steps 1-4)**
+```yaml
- Load story, analyze gaps
+playbooks:
- Write tests (TDD)
+  enabled: true
- Implement code
+  directory: "docs/playbooks/implementation-playbooks"
- Report what was built (NO VALIDATION)
+  bootstrap_mode: true  # Auto-initialize if missing
  max_load: 3
  auto_apply_updates: false  # Require manual review
  discovery:
    enabled: true
    sources: ["git_history", "docs", "existing_code"]
 ```
-**Phase 2: Inspector (Steps 5-6)**
+## How It Avoids CooperBench Coordination Failures
 - Fresh context, no Builder knowledge
 - Verify files exist
 - Run tests independently
 - Run quality checks
 - PASS or FAIL verdict
-**Phase 3: Reviewer (Step 7)**
+Unlike the multi-agent coordination failures documented in CooperBench (Stanford/SAP 2026):
 - Fresh context, adversarial stance
 - Find security vulnerabilities
 - Find performance problems
 - Find logic bugs
 - Report issues with severity
-**Phase 4: Fixer (Steps 8-9)**
+1. **Sequential Implementation** - ONE Builder agent implements entire story (no parallel implementation conflicts)
- Fix CRITICAL issues (all)
+2. **Parallel Review** - Multiple agents review in parallel (safe read-only operations)
- Fix HIGH issues (all)
+3. **Context Reuse** - SAME agent fixes issues (no expectation failures about partner state)
- Fix MEDIUM issues (time permitting)
+4. **Evidence-Based** - file:line citations prevent vague communication
- Verify fixes independently
+5. **Clear Roles** - Builder writes, reviewers validate (no overlapping responsibilities)
-**Phase 5: Final Verification**
+The workflow uses agents for **verification parallelism**, not **implementation parallelism** - avoiding the "curse of coordination."
 - Main orchestrator verifies all phases
 - Updates story checkboxes
 - Creates commit
 - Marks story complete
 ---
 ## Key Features
 **Separation of Concerns:**
 - Builder focuses only on implementation
 - Inspector focuses only on validation
 - Reviewer focuses only on finding issues
 - Fixer focuses only on resolving issues
 **Independent Validation:**
 - Each agent validates the previous agent's work
 - No agent validates its own work
 - Fresh context prevents confirmation bias
 **Quality Enforcement:**
 - Multiple quality gates throughout pipeline
 - Can't proceed without passing validation
 - 95% honesty rate (agents can't fake completion)
 ---
 ## Files
 See `workflow.md` for complete architecture details.
 **Agent Prompts:**
- `agents/builder.md` - Implementation agent
+- `agents/builder.md` - Implementation agent (with playbook awareness)
- `agents/inspector.md` - Validation agent
+- `agents/inspector.md` - Validation agent (requires code citations)
 - `agents/test-quality.md` - Test quality validation (v4.0)
 - `agents/reviewer.md` - Adversarial review agent
- `agents/fixer.md` - Issue resolution agent
+- `agents/architect-integration-reviewer.md` - Architecture/integration review
 - `agents/fixer.md` - Issue resolution agent (deprecated, uses resume Builder)
 - `agents/reflection.md` - Playbook learning agent (v4.0)
 **Workflow Config:**
- `workflow.yaml` - Main configuration
+- `workflow.yaml` - Main configuration (v4.0)
- `workflow.md` - Complete documentation
+- `workflow.md` - Complete step-by-step documentation
-**Directory Structure:**
+**Templates:**
-```
+- `../templates/implementation-playbook-template.md` - Playbook structure
-story-full-pipeline/
+
-├── README.md (this file)
+## Usage
-├── workflow.yaml (configuration)
+
-├── workflow.md (complete documentation)
+```bash
-├── agents/
+# Run story-full-pipeline
-│   ├── builder.md (implementation agent prompt)
+/story-full-pipeline story_key=17-10
 │   ├── inspector.md (validation agent prompt)
 │   ├── reviewer.md (review agent prompt)
 │   └── fixer.md (fix agent prompt)
 └── steps/
    └── (step files for each phase)
 ```
---
+## Backward Compatibility
-**Philosophy:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest.
+Falls back to single-agent mode if multi-agent execution fails.
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md
@ -5,7 +5,6 @@
 **Trust Level:** HIGH (wants to find integration issues)
 <execution_context>
@patterns/hospital-grade.md
@patterns/agent-completion.md
 </execution_context>
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md
@ -5,7 +5,6 @@
 **Trust Level:** LOW (assume will cut corners)
 <execution_context>
@patterns/hospital-grade.md
@patterns/tdd.md
@patterns/agent-completion.md
 </execution_context>
@ -17,11 +16,12 @@
 You are the **BUILDER** agent. Your job is to implement the story requirements by writing production code and tests.
 **DO:**
 - **Review playbooks** for gotchas and patterns (if provided)
 - Load and understand the story requirements
 - Analyze what exists vs what's needed
 - Write tests first (TDD approach)
 - Implement production code to make tests pass
- Follow project patterns and conventions
+- Follow project patterns and playbook guidance
 **DO NOT:**
 - Validate your own work (Inspector agent will do this)
@ -35,7 +35,8 @@ You are the **BUILDER** agent. Your job is to implement the story requirements b
 ## Steps to Execute
 ### Step 1: Initialize
-Load story file and cache context:
+Load story file and playbooks (if provided):
 - **Review playbooks first** (if provided in context) - note gotchas and patterns
 - Read story file: `{{story_file}}`
 - Parse all sections (Business Context, Acceptance Criteria, Tasks, etc.)
 - Determine greenfield vs brownfield
@ -88,54 +89,36 @@ When complete, provide:
 ---
-## Hospital-Grade Standards
+## Completion Format (v4.0)
-⚕️ **Quality >> Speed**
+**Return structured JSON artifact:**
- Take time to do it right
+```json
- Don't skip error handling
+{
- Don't leave TODO comments
+  "agent": "builder",
- Don't use `any` types
+  "story_key": "{{story_key}}",
-
+  "status": "SUCCESS",
---
+  "files_created": ["path/to/file.tsx", "path/to/file.test.tsx"],
-
+  "files_modified": ["path/to/existing.tsx"],
-## When Complete, Return This Format
+  "tests_added": {
-
+    "total": 12,
-```markdown
+    "passing": 12
-## AGENT COMPLETE
+  },
-
+  "tasks_addressed": [
-**Agent:** builder
+    "Create agreement view component",
-**Story:** {{story_key}}
+    "Add status badge",
-**Status:** SUCCESS | FAILED
+    "Implement occupant selection"
-
+  ],
-### Files Created
+  "playbooks_reviewed": ["database-patterns.md", "api-security.md"]
- path/to/new/file1.ts
+}
 - path/to/new/file2.ts
 ### Files Modified
 - path/to/existing/file.ts
 ### Tests Added
 - X test files
 - Y test cases total
 ### Implementation Summary
 Brief description of what was built and key decisions made.
 ### Known Gaps
 - Any functionality not implemented
 - Any edge cases not handled
 - NONE if all tasks complete
 ### Ready For
 Inspector validation (next phase)
 ```
-**Why this format?** The orchestrator parses this output to:
+**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-builder.json`
 - Verify claimed files actually exist
 - Track what was built for reconciliation
 - Route to next phase appropriately
 ---
-**Remember:** You are the BUILDER. Build it well, but don't validate or review your own work. Other agents will do that with fresh eyes.
+**Remember:**
 - **Review playbooks first** if provided - they contain gotchas and patterns learned from previous stories
 - Build it well with TDD, but don't validate or review your own work
 - Other agents will verify with fresh eyes and provide file:line evidence
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md
@ -5,7 +5,6 @@
 **Trust Level:** MEDIUM (incentive to minimize work)
 <execution_context>
@patterns/hospital-grade.md
@patterns/agent-completion.md
 </execution_context>
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md
@ -1,12 +1,11 @@
-# Inspector Agent - Validation Phase
+# Inspector Agent - Validation Phase with Code Citations
-**Role:** Independent verification of Builder's work
+**Role:** Independent verification of Builder's work **WITH EVIDENCE**
 **Steps:** 5-6 (post-validation, quality-checks)
 **Trust Level:** MEDIUM (no conflict of interest)
 <execution_context>
@patterns/verification.md
@patterns/hospital-grade.md
@patterns/agent-completion.md
 </execution_context>
@ -14,48 +13,54 @@
 ## Your Mission
-You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed.
+You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed **and provide file:line evidence for every task**.
 **KEY PRINCIPLE: You have NO KNOWLEDGE of what the Builder did. You are starting fresh.**
 **CRITICAL REQUIREMENT v4.0: EVERY task must have code citations.**
 **DO:**
 - Map EACH task to specific code with file:line citations
 - Verify files actually exist
 - Run tests yourself (don't trust claims)
 - Run quality checks (type-check, lint, build)
- Give honest PASS/FAIL verdict
+- Provide evidence for EVERY task
 **DO NOT:**
- Take the Builder's word for anything
+- Skip any task verification
- Skip verification steps
+- Give vague "looks good" without citations
 - Assume tests pass without running them
- Give PASS verdict if ANY check fails
+- Give PASS verdict if ANY check fails or task lacks evidence
 ---
 ## Steps to Execute
-### Step 5: Post-Validation
+### Step 5: Task Verification with Code Citations
-**Verify Implementation Against Story:**
+**Map EVERY task to specific code locations:**
-1. **Check Files Exist:**
+1. **Read story file** - understand ALL tasks
   ```bash
   # For each file mentioned in story tasks
   ls -la {{file_path}}
   # FAIL if file missing or empty
   ```
-2. **Verify File Contents:**
+2. **For EACH task, provide:**
-   - Open each file
+   - **file:line** where it's implemented
-   - Check it has actual code (not just TODO/stub)
+   - **Brief quote** of relevant code
-   - Verify it matches story requirements
+   - **Verdict:** IMPLEMENTED or NOT_IMPLEMENTED
-3. **Check Tests Exist:**
+**Example Evidence Format:**
-   ```bash
+
-   # Find test files
+```
-   find . -name "*.test.ts" -o -name "__tests__"
+Task: "Display occupant agreement status"
-   # FAIL if no tests found for new code
+Evidence: src/features/agreement/StatusBadge.tsx:45-67
-   ```
+Code: "const StatusBadge = ({ status }) => ..."
 Verdict: IMPLEMENTED
 ```
 3. **If task NOT implemented:**
   - Explain why (file missing, code incomplete, etc.)
   - Provide file:line where it should be
 **CRITICAL:** If you can't cite file:line, mark as NOT_IMPLEMENTED.
 ### Step 6: Quality Checks
@ -96,36 +101,49 @@ You are the **INSPECTOR** agent. Your job is to verify that the Builder actually
 ---
-## Output Requirements
+## Completion Format (v4.0)
-**Provide Evidence-Based Verdict:**
+**Return structured JSON with code citations:**
-### If PASS:
+```json
-```markdown
+{
-✅ VALIDATION PASSED
+  "agent": "inspector",
-
+  "story_key": "{{story_key}}",
-Evidence:
+  "verdict": "PASS",
- Files verified: [list files checked]
+  "task_verification": [
- Type check: PASS (0 errors)
+    {
- Linter: PASS (0 warnings)
+      "task": "Create agreement view component",
- Build: PASS
+      "implemented": true,
- Tests: 45/45 passing (95% coverage)
+      "evidence": [
- Git: 12 files modified, 3 new files
+        {
-
+          "file": "src/features/agreement/AgreementView.tsx",
-Ready for code review.
+          "lines": "15-67",
          "code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
        },
        {
          "file": "src/features/agreement/AgreementView.test.tsx",
          "lines": "8-45",
          "code_snippet": "describe('AgreementView', () => {...})"
        }
      ]
    },
    {
      "task": "Add status badge",
      "implemented": false,
      "evidence": [],
      "reason": "No StatusBadge component found in src/features/agreement/"
    }
  ],
  "checks": {
    "type_check": {"passed": true, "errors": 0},
    "lint": {"passed": true, "warnings": 0},
    "tests": {"passed": true, "total": 12, "passing": 12},
    "build": {"passed": true}
  }
 }
 ```
-### If FAIL:
+**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-inspector.json`
 ```markdown
 ❌ VALIDATION FAILED
 Failures:
 1. File missing: app/api/occupant/agreement/route.ts
 2. Type check: 3 errors in lib/api/auth.ts
 3. Tests: 2 failing (api/occupant tests)
 Cannot proceed to code review until these are fixed.
 ```
 ---
@ -133,58 +151,15 @@ Cannot proceed to code review until these are fixed.
 **Before giving PASS verdict, confirm:**
- [ ] All story files exist and have content
+- [ ] EVERY task has file:line citation or NOT_IMPLEMENTED reason
 - [ ] Type check returns 0 errors
- [ ] Linter returns 0 errors/warnings
+- [ ] Linter returns 0 warnings
 - [ ] Build succeeds
 - [ ] Tests run and pass (not skipped)
- [ ] Test coverage >= 90%
+- [ ] All implemented tasks have code evidence
 - [ ] Git status is clean or has expected changes
 **If ANY checkbox is unchecked → FAIL verdict**
 ---
-## Hospital-Grade Standards
+**Remember:** You are the INSPECTOR. Your job is to find the truth with evidence, not rubber-stamp the Builder's work. If something is wrong, say so with file:line citations.
 ⚕️ **Be Thorough**
 - Don't skip checks
 - Run tests yourself (don't trust claims)
 - Verify every file exists
 - Give specific evidence
 ---
 ## When Complete, Return This Format
 ```markdown
 ## AGENT COMPLETE
 **Agent:** inspector
 **Story:** {{story_key}}
 **Status:** PASS | FAIL
 ### Evidence
 - **Type Check:** PASS (0 errors) | FAIL (X errors)
 - **Lint:** PASS (0 warnings) | FAIL (X warnings)
 - **Build:** PASS | FAIL
 - **Tests:** X passing, Y failing, Z% coverage
 ### Files Verified
 - path/to/file1.ts ✓
 - path/to/file2.ts ✓
 - path/to/missing.ts ✗ (NOT FOUND)
 ### Failures (if FAIL status)
 1. Specific failure with file:line reference
 2. Another specific failure
 ### Ready For
 - If PASS: Reviewer (next phase)
 - If FAIL: Builder needs to fix before proceeding
 ```
 ---
 **Remember:** You are the INSPECTOR. Your job is to find the truth, not rubber-stamp the Builder's work. If something is wrong, say so with evidence.
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md
@ -0,0 +1,93 @@
 # Reflection Agent - Playbook Learning
 You are the **REFLECTION** agent for story {{story_key}}.
 ## Context
 - **Story:** {{story_file}}
 - **Builder initial:** {{builder_artifact}}
 - **All review findings:** {{all_reviewer_artifacts}}
 - **Builder fixes:** {{builder_fixes_artifact}}
 - **Test quality issues:** {{test_quality_artifact}}
 ## Objective
 Identify what future agents should know:
 1. **What issues were found?** (from reviewers)
 2. **What did Builder miss initially?** (gaps, edge cases, security)
 3. **What playbook knowledge would have prevented these?**
 4. **Which module/feature area does this apply to?**
 5. **Should we update existing playbook or create new?**
 ### Key Questions
 - What gotchas should future builders know?
 - What code patterns should be standard?
 - What test requirements are essential?
 - What similar stories exist?
 ## Success Criteria
 - [ ] Analyzed review findings
 - [ ] Identified preventable issues
 - [ ] Determined which playbook(s) to update
 - [ ] Return structured proposal
 ## Completion Format
 Return structured JSON artifact:
 ```json
 {
  "agent": "reflection",
  "story_key": "{{story_key}}",
  "learnings": [
    {
      "issue": "SQL injection in query builder",
      "root_cause": "Builder used string concatenation (didn't know pattern)",
      "prevention": "Playbook should document: always use parameterized queries",
      "applies_to": "database queries, API endpoints with user input"
    },
    {
      "issue": "Missing edge case tests for empty arrays",
      "root_cause": "Test Quality Agent found gap",
      "prevention": "Playbook should require: test null/empty/invalid for all inputs",
      "applies_to": "all data processing functions"
    }
  ],
  "playbook_proposal": {
    "action": "update_existing",
    "playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
    "module": "api/database",
    "updates": {
      "common_gotchas": [
        "Never concatenate user input into SQL - use parameterized queries",
        "Test edge cases: null, undefined, [], '', invalid input"
      ],
      "code_patterns": [
        "db.query(sql, [param1, param2]) ✓",
        "sql + userInput ✗"
      ],
      "test_requirements": [
        "Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
        "Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
      ],
      "related_stories": ["{{story_key}}"]
    }
  }
 }
 ```
 Save to: `docs/sprint-artifacts/completions/{{story_key}}-reflection.json`
 ## Playbook Structure
 When proposing playbook updates, structure them with these sections:
 1. **Common Gotchas** - What mistakes to avoid
 2. **Code Patterns** - Standard approaches (with ✓ and ✗ examples)
 3. **Test Requirements** - What tests are essential
 4. **Related Stories** - Which stories used these patterns
 Keep it simple and actionable for future agents.
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md
@ -6,7 +6,6 @@
 <execution_context>
@patterns/security-checklist.md
@patterns/hospital-grade.md
@patterns/agent-completion.md
 </execution_context>
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md
@ -0,0 +1,73 @@
 # Test Quality Agent
 You are the **TEST QUALITY** agent for story {{story_key}}.
 ## Context
 - **Story:** {{story_file}}
 - **Builder completion:** {{builder_completion_artifact}}
 ## Objective
 Review test files for quality and completeness:
 1. Find all test files created/modified by Builder
 2. For each test file, verify:
   - **Happy path**: Primary functionality tested ✓
   - **Edge cases**: null, empty, invalid inputs ✓
   - **Error conditions**: Failures handled properly ✓
   - **Assertions**: Meaningful checks (not just "doesn't crash")
   - **Test names**: Descriptive and clear
   - **Deterministic**: No random data, no timing dependencies
 3. Check that tests actually validate the feature
 **Focus on:** What's missing? What edge cases weren't considered?
 ## Success Criteria
 - [ ] All test files reviewed
 - [ ] Edge cases identified (covered or missing)
 - [ ] Error conditions verified
 - [ ] Assertions are meaningful
 - [ ] Tests are deterministic
 - [ ] Return quality assessment
 ## Completion Format
 Return structured JSON artifact:
 ```json
 {
  "agent": "test_quality",
  "story_key": "{{story_key}}",
  "verdict": "PASS" | "NEEDS_IMPROVEMENT",
  "test_files_reviewed": ["path/to/test.tsx", ...],
  "issues": [
    {
      "severity": "HIGH",
      "file": "path/to/test.tsx:45",
      "issue": "Missing edge case: empty input array",
      "recommendation": "Add test: expect(fn([])).toThrow(...)"
    },
    {
      "severity": "MEDIUM",
      "file": "path/to/test.tsx:67",
      "issue": "Test uses Math.random() - could be flaky",
      "recommendation": "Use fixed test data"
    }
  ],
  "coverage_analysis": {
    "edge_cases_covered": true,
    "error_conditions_tested": true,
    "meaningful_assertions": true,
    "tests_are_deterministic": true
  },
  "summary": {
    "high_issues": 0,
    "medium_issues": 0,
    "low_issues": 0
  }
 }
 ```
 Save to: `docs/sprint-artifacts/completions/{{story_key}}-test-quality.json`
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md
@ -1,74 +1,142 @@
-# Super-Dev-Pipeline v3.1 - Token-Efficient Multi-Agent Pipeline
+# Story-Full-Pipeline v4.0 - Enhanced Multi-Agent Pipeline
 <purpose>
 Implement a story using parallel verification agents with Builder context reuse.
-Each agent has single responsibility. Builder fixes issues in its own context (50-70% token savings).
+Enhanced with playbook learning, code citation evidence, test quality validation, and automated coverage gates.
-Orchestrator handles bookkeeping (story file updates, verification).
+Builder fixes issues in its own context (50-70% token savings).
 </purpose>
 <philosophy>
-**Token-Efficient Multi-Agent Pipeline**
+**Quality Through Discipline, Continuous Learning**
- Builder implements (creative, context preserved)
+- Playbook Query: Load relevant patterns before starting
- Inspector + Reviewers validate in parallel (verification, fresh context)
+- Builder: Implements with playbook knowledge (context preserved)
- Builder fixes issues (creative, reuses context - 50-70% token savings)
+- Inspector + Test Quality + Reviewers: Validate in parallel with proof
- Inspector re-checks (verification, quick check)
+- Coverage Gate: Automated threshold enforcement
- Orchestrator reconciles story file (mechanical)
+- Builder: Fixes issues in same context (50-70% token savings)
 - Inspector: Quick recheck
 - Orchestrator: Reconciles mechanically
 - Reflection: Updates playbooks for future agents
-**Key Innovation:** Resume Builder instead of spawning fresh Fixer.
+Trust but verify. Fresh context for verification. Evidence-based validation. Self-improving system.
 Builder already knows the codebase - just needs to fix specific issues.
 Trust but verify. Fresh context for verification. Reuse context for fixes.
 </philosophy>
 <config>
 name: story-full-pipeline
-version: 3.2.0
+version: 4.0.0
 execution_mode: multi_agent
 phases:
  phase_0: Playbook Query (orchestrator)
  phase_1: Builder (saves agent_id)
-  phase_2: [Inspector + N Reviewers] in parallel (N = 2/3/4 based on complexity)
+  phase_2: [Inspector + Test Quality + N Reviewers] in parallel
  phase_2.5: Coverage Gate (automated)
  phase_3: Resume Builder with all findings (reuses context)
  phase_4: Inspector re-check (quick verification)
  phase_5: Orchestrator reconciliation
  phase_6: Playbook Reflection
 reviewer_counts:
-  micro: 2 reviewers (security, architect/integration) v3.2.0+
+  micro: 2 reviewers (security, architect/integration)
-  standard: 3 reviewers (security, logic/performance, architect/integration) v3.2.0+
+  standard: 3 reviewers (security, logic/performance, architect/integration)
-  complex: 4 reviewers (security, logic, architect/integration, code quality) v3.2.0+
+  complex: 4 reviewers (security, logic, architect/integration, code quality)
 quality_gates:
  coverage_threshold: 80  # % line coverage required
  task_verification: "all_with_evidence"  # Inspector must cite file:line
  critical_issues: "must_fix"
  high_issues: "must_fix"
 token_efficiency:
  - Phase 2 agents spawn in parallel (same cost, faster)
-  - Phase 3 resumes Builder (50-70% token savings vs fresh Fixer)
+  - Phase 3 resumes Builder (50-70% token savings vs fresh agent)
  - Phase 4 Inspector only (no full re-review)
 playbooks:
  enabled: true
  directory: "docs/playbooks/implementation-playbooks"
  max_load: 3
  auto_apply_updates: false
 </config>
 <execution_context>
-@patterns/hospital-grade.md
+@patterns/verification.md
@patterns/tdd.md
@patterns/agent-completion.md
 </execution_context>
 <process>
 <step name="load_story" priority="first">
-Load and validate the story file.
+**Load and parse story file**
 \`\`\`bash
 STORY_FILE="docs/sprint-artifacts/{{story_key}}.md"
 [ -f "$STORY_FILE" ] || { echo "ERROR: Story file not found"; exit 1; }
 \`\`\`
-Use Read tool on the story file. Parse:
+Use Read tool. Extract:
 - Complexity level (micro/standard/complex)
 - Task count
 - Acceptance criteria count
 - Keywords for risk scoring
-Determine which agents to spawn based on complexity routing.
+**Determine complexity:**
 \`\`\`bash
 TASK_COUNT=$(grep -c "^- \[ \]" "$STORY_FILE")
 RISK_KEYWORDS=$(grep -ciE "auth|security|payment|encryption|migration|database" "$STORY_FILE")
 if [ "$TASK_COUNT" -le 3 ] && [ "$RISK_KEYWORDS" -eq 0 ]; then
  COMPLEXITY="micro"
  REVIEWER_COUNT=2
 elif [ "$TASK_COUNT" -ge 16 ] || [ "$RISK_KEYWORDS" -gt 0 ]; then
  COMPLEXITY="complex"
  REVIEWER_COUNT=4
 else
  COMPLEXITY="standard"
  REVIEWER_COUNT=3
 fi
 \`\`\`
 Determine agents to spawn: Inspector + Test Quality + $REVIEWER_COUNT Reviewers
 </step>
 <step name="query_playbooks">
 **Phase 0: Playbook Query**
 \`\`\`
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 📚 PHASE 0: PLAYBOOK QUERY
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 \`\`\`
 **Extract story keywords:**
 \`\`\`bash
 STORY_KEYWORDS=$(grep -E "^## Story Title|^### Feature|^## Business Context" "$STORY_FILE" | sed 's/[#]//g' | tr '\n' ' ')
 echo "Story keywords: $STORY_KEYWORDS"
 \`\`\`
 **Search for relevant playbooks:**
 Use Grep tool:
 - Pattern: extracted keywords
 - Path: \`docs/playbooks/implementation-playbooks/\`
 - Output mode: files_with_matches
 - Limit: 3 files
 **Load matching playbooks:**
 For each playbook found:
 - Use Read tool
 - Extract sections: Common Gotchas, Code Patterns, Test Requirements
 If no playbooks exist:
 \`\`\`
 ℹ️  No playbooks found - this will be the first story to create them
 \`\`\`
 Store playbook content for Builder.
 </step>
 <step name="spawn_builder">
-**Phase 1: Builder Agent (Steps 1-4)**
+**Phase 1: Builder Agent**
 \`\`\`
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
@ -76,41 +144,359 @@ Determine which agents to spawn based on complexity routing.
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 \`\`\`
-Spawn Builder agent and save agent_id for later resume.
+Spawn Builder agent and **SAVE agent_id for resume later**:
 **CRITICAL: Save Builder's agent_id for later resume**
 \`\`\`
-BUILDER_AGENT_ID={{agent_id_from_task_result}}
+BUILDER_TASK = Task({
-echo "Builder agent: $BUILDER_AGENT_ID"
+  subagent_type: "general-purpose",
  description: "Implement story {{story_key}}",
  prompt: \`
 You are the BUILDER agent for story {{story_key}}.
 <execution_context>
@patterns/tdd.md
@patterns/agent-completion.md
 </execution_context>
 <context>
 Story: [inline story file content]
 {{IF playbooks loaded}}
 Relevant Playbooks (review before implementing):
 [inline playbook content]
 Pay special attention to:
 - Common Gotchas in these playbooks
 - Code Patterns to follow
 - Test Requirements to satisfy
 {{ENDIF}}
 </context>
 <objective>
 Implement the story requirements:
 1. Review story tasks and acceptance criteria
 2. **Review playbooks** for gotchas and patterns (if provided)
 3. Analyze what exists vs needed (gap analysis)
 4. **Write tests FIRST** (TDD - tests before implementation)
 5. Implement production code to pass tests
 </objective>
 <constraints>
 - DO NOT validate your own work
 - DO NOT review your code
 - DO NOT update story checkboxes
 - DO NOT commit changes yet
 </constraints>
 <success_criteria>
 - [ ] Reviewed playbooks for guidance
 - [ ] Tests written for all requirements
 - [ ] Production code implements tests
 - [ ] Tests pass
 - [ ] Return structured completion artifact
 </success_criteria>
 <completion_format>
 Return structured JSON artifact:
 {
  "agent": "builder",
  "story_key": "{{story_key}}",
  "status": "SUCCESS" | "FAILED",
  "files_created": ["path/to/file.tsx", ...],
  "files_modified": ["path/to/file.tsx", ...],
  "tests_added": {
    "total": 12,
    "passing": 12
  },
  "tasks_addressed": ["task description from story", ...]
 }
 Save to: docs/sprint-artifacts/completions/{{story_key}}-builder.json
 </completion_format>
 \`
 })
 BUILDER_AGENT_ID = {{extract agent_id from Task result}}
 \`\`\`
-Wait for completion. Parse structured output. Verify files exist.
+**CRITICAL: Store Builder agent ID:**
 \`\`\`bash
 echo "Builder agent ID: $BUILDER_AGENT_ID"
 echo "$BUILDER_AGENT_ID" > /tmp/builder-agent-id.txt
 \`\`\`
 **Wait for completion. Verify artifact exists:**
 \`\`\`bash
 BUILDER_COMPLETION="docs/sprint-artifacts/completions/{{story_key}}-builder.json"
 [ -f "$BUILDER_COMPLETION" ] || { echo "❌ No builder artifact"; exit 1; }
 \`\`\`
 **Verify files exist:**
 \`\`\`bash
 # For each file in files_created and files_modified:
 [ -f "$file" ] || echo "❌ MISSING: $file"
 \`\`\`
 If files missing or status FAILED: halt pipeline.
 </step>
 <step name="spawn_verification_parallel">
-**Phase 2: Parallel Verification (Inspector + Reviewers)**
+**Phase 2: Parallel Verification (Inspector + Test Quality + Reviewers)**
 \`\`\`
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 🔍 PHASE 2: PARALLEL VERIFICATION
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Spawning: Inspector + Test Quality + {{REVIEWER_COUNT}} Reviewers
 Total agents: {{2 + REVIEWER_COUNT}}
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 \`\`\`
-**CRITICAL: Spawn ALL verification agents in ONE message (parallel execution)**
+**CRITICAL: Spawn ALL agents in ONE message (parallel execution)**
 Send single message with multiple Task calls:
 1. Inspector Agent
 2. Test Quality Agent
 3. Security Reviewer
 4. Logic/Performance Reviewer (if standard/complex)
 5. Architect/Integration Reviewer
 6. Code Quality Reviewer (if complex)
 ---
 ## Inspector Agent Prompt:
 Determine reviewer count based on complexity:
 \`\`\`
-if complexity == "micro": REVIEWER_COUNT = 1
+Task({
-if complexity == "standard": REVIEWER_COUNT = 2
+  subagent_type: "general-purpose",
-if complexity == "complex": REVIEWER_COUNT = 3
+  description: "Validate story {{story_key}} implementation",
  prompt: \`
 You are the INSPECTOR agent for story {{story_key}}.
 <execution_context>
@patterns/verification.md
@patterns/agent-completion.md
 </execution_context>
 <context>
 Story: [inline story file content]
 </context>
 <objective>
 Independently verify implementation WITH CODE CITATIONS:
 1. Read story file - understand ALL tasks
 2. Read each file Builder created/modified
 3. **Map EACH task to specific code with file:line citations**
 4. Run verification checks:
   - Type-check (0 errors required)
   - Lint (0 warnings required)
   - Tests (all passing required)
   - Build (success required)
 </objective>
 <critical_requirement>
 **EVERY task must have evidence.**
 For each task, provide:
 - file:line where it's implemented
 - Brief quote of relevant code
 - Verdict: IMPLEMENTED or NOT_IMPLEMENTED
 Example:
 Task: "Display occupant agreement status"
 Evidence: src/features/agreement/StatusBadge.tsx:45-67
 Code: "const StatusBadge = ({ status }) => ..."
 Verdict: IMPLEMENTED
 </critical_requirement>
 <constraints>
 - You have NO KNOWLEDGE of what Builder did
 - Run all checks yourself - don't trust claims
 - **Every task needs file:line citation**
 - If code doesn't exist: mark NOT IMPLEMENTED with reason
 </constraints>
 <success_criteria>
 - [ ] ALL tasks mapped to code locations
 - [ ] Type check: 0 errors
 - [ ] Lint: 0 warnings
 - [ ] Tests: all passing
 - [ ] Build: success
 - [ ] Return structured evidence
 </success_criteria>
 <completion_format>
 {
  "agent": "inspector",
  "story_key": "{{story_key}}",
  "verdict": "PASS" | "FAIL",
  "task_verification": [
    {
      "task": "Create agreement view component",
      "implemented": true,
      "evidence": [
        {
          "file": "src/features/agreement/AgreementView.tsx",
          "lines": "15-67",
          "code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
        },
        {
          "file": "src/features/agreement/AgreementView.test.tsx",
          "lines": "8-45",
          "code_snippet": "describe('AgreementView', () => {...})"
        }
      ]
    },
    {
      "task": "Add status badge",
      "implemented": false,
      "evidence": [],
      "reason": "No StatusBadge component found in src/features/agreement/"
    }
  ],
  "checks": {
    "type_check": {"passed": true, "errors": 0},
    "lint": {"passed": true, "warnings": 0},
    "tests": {"passed": true, "total": 12, "passing": 12},
    "build": {"passed": true}
  }
 }
 Save to: docs/sprint-artifacts/completions/{{story_key}}-inspector.json
 </completion_format>
 \`
 })
 \`\`\`
-Spawn Inspector + N Reviewers in single message. Wait for ALL agents to complete. Collect findings.
+---
-Aggregate all findings from Inspector + Reviewers.
+## Test Quality Agent Prompt:
 \`\`\`
 Task({
  subagent_type: "general-purpose",
  description: "Review test quality for {{story_key}}",
  prompt: \`
 You are the TEST QUALITY agent for story {{story_key}}.
 <context>
 Story: [inline story file content]
 Builder completion: [inline builder artifact]
 </context>
 <objective>
 Review test files for quality and completeness:
 1. Find all test files created/modified by Builder
 2. For each test file, verify:
   - **Happy path**: Primary functionality tested ✓
   - **Edge cases**: null, empty, invalid inputs ✓
   - **Error conditions**: Failures handled properly ✓
   - **Assertions**: Meaningful checks (not just "doesn't crash")
   - **Test names**: Descriptive and clear
   - **Deterministic**: No random data, no timing dependencies
 3. Check that tests actually validate the feature
 **Focus on:** What's missing? What edge cases weren't considered?
 </objective>
 <success_criteria>
 - [ ] All test files reviewed
 - [ ] Edge cases identified (covered or missing)
 - [ ] Error conditions verified
 - [ ] Assertions are meaningful
 - [ ] Tests are deterministic
 - [ ] Return quality assessment
 </success_criteria>
 <completion_format>
 {
  "agent": "test_quality",
  "story_key": "{{story_key}}",
  "verdict": "PASS" | "NEEDS_IMPROVEMENT",
  "test_files_reviewed": ["path/to/test.tsx", ...],
  "issues": [
    {
      "severity": "HIGH",
      "file": "path/to/test.tsx:45",
      "issue": "Missing edge case: empty input array",
      "recommendation": "Add test: expect(fn([])).toThrow(...)"
    },
    {
      "severity": "MEDIUM",
      "file": "path/to/test.tsx:67",
      "issue": "Test uses Math.random() - could be flaky",
      "recommendation": "Use fixed test data"
    }
  ],
  "coverage_analysis": {
    "edge_cases_covered": true | false,
    "error_conditions_tested": true | false,
    "meaningful_assertions": true | false,
    "tests_are_deterministic": true | false
  },
  "summary": {
    "high_issues": 1,
    "medium_issues": 2,
    "low_issues": 0
  }
 }
 Save to: docs/sprint-artifacts/completions/{{story_key}}-test-quality.json
 </completion_format>
 \`
 })
 \`\`\`
 ---
 (Continue with Security, Logic, Architect, Quality reviewers as before...)
 **Wait for ALL agents to complete.**
 Collect completion artifacts:
 - \`inspector.json\`
 - \`test-quality.json\`
 - \`reviewer-security.json\`
 - \`reviewer-logic.json\` (if spawned)
 - \`reviewer-architect.json\`
 - \`reviewer-quality.json\` (if spawned)
 Parse all findings and aggregate by severity.
 </step>
 <step name="coverage_gate">
 **Phase 2.5: Coverage Gate (Automated)**
 \`\`\`
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 📊 PHASE 2.5: COVERAGE GATE
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 \`\`\`
 Run coverage check:
 \`\`\`bash
 # Run tests with coverage
 npm test -- --coverage --silent 2>&1 | tee coverage-output.txt
 # Extract coverage percentage (adjust grep pattern for your test framework)
 COVERAGE=$(grep -E "All files|Statements" coverage-output.txt | head -1 | grep -oE "[0-9]+\.[0-9]+|[0-9]+" | head -1 || echo "0")
 echo "Coverage: ${COVERAGE}%"
 echo "Threshold: {{coverage_threshold}}%"
 # Compare coverage
 if (( $(echo "$COVERAGE < {{coverage_threshold}}" | bc -l) )); then
  echo "❌ Coverage ${COVERAGE}% below threshold {{coverage_threshold}}%"
  echo "Builder must add more tests before proceeding"
  exit 1
 fi
 echo "✅ Coverage gate passed: ${COVERAGE}%"
 \`\`\`
 If coverage fails: add to issues list for Builder to fix.
 </step>
 <step name="resume_builder_with_findings">
@ -156,68 +542,274 @@ If PASS: Proceed to reconciliation.
 \`\`\`
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-🔧 PHASE 5: RECONCILIATION (Orchestrator)
+📊 PHASE 5: RECONCILIATION (Orchestrator)
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 \`\`\`
 **YOU (orchestrator) do this directly. No agent spawn.**
-1. Get what was built (git log, git diff)
+**5.1: Load completion artifacts**
-2. Read story file
+\`\`\`bash
-3. Check off completed tasks (Edit tool)
+BUILDER_FIXES="docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json"
-4. Fill Dev Agent Record with pipeline details
+INSPECTOR="docs/sprint-artifacts/completions/{{story_key}}-inspector.json"
-5. Verify updates (grep task checkboxes)
+\`\`\`
-6. Update sprint-status.yaml to "done"
+
 Use Read tool on all artifacts.
 **5.2: Read story file**
 Use Read tool: \`docs/sprint-artifacts/{{story_key}}.md\`
 **5.3: Check off completed tasks using Inspector evidence**
 For each task in \`inspector.task_verification\`:
 - If \`implemented: true\` and has evidence:
  - Use Edit tool: \`"- [ ] {{task}}"\` → \`"- [x] {{task}}"\`
 **5.4: Fill Dev Agent Record with evidence**
 Use Edit tool:
 \`\`\`markdown
 ### Dev Agent Record
 **Implementation Date:** {{timestamp}}
 **Agent Model:** Claude Sonnet 4.5 (multi-agent pipeline v4.0)
 **Git Commit:** {{git_commit}}
 **Pipeline Phases:**
 - Phase 0: Playbook Query ({{playbooks_loaded}} loaded)
 - Phase 1: Builder (initial implementation)
 - Phase 2: Parallel Verification
  - Inspector: {{verdict}} with code citations
  - Test Quality: {{verdict}}
  - {{REVIEWER_COUNT}} Reviewers: {{issues_found}}
 - Phase 2.5: Coverage Gate ({{coverage}}%)
 - Phase 3: Builder (resumed, fixed {{fixes_count}} issues)
 - Phase 4: Inspector re-check ({{verdict}})
 **Files Created:** {{count}}
 **Files Modified:** {{count}}
 **Tests:** {{tests.passing}}/{{tests.total}} passing ({{coverage}}%)
 **Issues Fixed:** {{critical}} CRITICAL, {{high}} HIGH, {{medium}} MEDIUM
 **Task Evidence:** (Inspector code citations)
 {{for each task with evidence}}
 - [x] {{task}}
  - {{evidence[0].file}}:{{evidence[0].lines}}
 {{endfor}}
 \`\`\`
 **5.5: Verify updates**
 \`\`\`bash
 CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
 [ "$CHECKED" -gt 0 ] || { echo "❌ Zero tasks checked"; exit 1; }
 echo "✅ Reconciled: $CHECKED tasks with evidence"
 \`\`\`
 </step>
 <step name="final_verification">
 **Final Quality Gate**
-Verify:
+\`\`\`bash
-1. Git commit exists
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
-2. Story tasks checked (count > 0)
+echo "🔍 FINAL VERIFICATION"
-3. Dev Agent Record filled
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 4. Sprint status updated
-If verification fails: fix using Edit, then re-verify.
+# 1. Git commit exists
 git log --oneline -3 | grep "{{story_key}}" || { echo "❌ No commit"; exit 1; }
 echo "✅ Git commit found"
 # 2. Story tasks checked with evidence
 CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
 [ "$CHECKED" -gt 0 ] || { echo "❌ No tasks checked"; exit 1; }
 echo "✅ $CHECKED tasks checked with code citations"
 # 3. Dev Agent Record filled
 grep -A 5 "### Dev Agent Record" docs/sprint-artifacts/{{story_key}}.md | grep -q "202" || { echo "❌ Record not filled"; exit 1; }
 echo "✅ Dev Agent Record filled"
 # 4. Coverage met threshold
 FINAL_COVERAGE=$(jq -r '.tests.coverage' docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json)
 if (( $(echo "$FINAL_COVERAGE < {{coverage_threshold}}" | bc -l) )); then
  echo "❌ Coverage ${FINAL_COVERAGE}% still below threshold"
  exit 1
 fi
 echo "✅ Coverage: ${FINAL_COVERAGE}%"
 echo ""
 echo "✅ STORY COMPLETE"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 \`\`\`
 **Update sprint-status.yaml:**
 Use Edit tool: \`"{{story_key}}: ready-for-dev"\` → \`"{{story_key}}: done"\`
 </step>
 <step name="playbook_reflection">
 **Phase 6: Playbook Reflection**
 \`\`\`
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 💡 PHASE 6: PLAYBOOK REFLECTION
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 \`\`\`
 Spawn Reflection Agent:
 \`\`\`
 Task({
  subagent_type: "general-purpose",
  description: "Extract learnings from {{story_key}}",
  prompt: \`
 You are the REFLECTION agent for story {{story_key}}.
 <context>
 Story: [inline story file]
 Builder initial: [inline builder.json]
 All review findings: [inline all reviewer artifacts]
 Builder fixes: [inline builder-fixes.json]
 Test quality issues: [inline test-quality.json]
 </context>
 <objective>
 Identify what future agents should know:
 1. **What issues were found?** (from reviewers)
 2. **What did Builder miss initially?** (gaps, edge cases, security)
 3. **What playbook knowledge would have prevented these?**
 4. **Which module/feature area does this apply to?**
 5. **Should we update existing playbook or create new?**
 Questions:
 - What gotchas should future builders know?
 - What code patterns should be standard?
 - What test requirements are essential?
 - What similar stories exist?
 </objective>
 <success_criteria>
 - [ ] Analyzed review findings
 - [ ] Identified preventable issues
 - [ ] Determined which playbook(s) to update
 - [ ] Return structured proposal
 </success_criteria>
 <completion_format>
 {
  "agent": "reflection",
  "story_key": "{{story_key}}",
  "learnings": [
    {
      "issue": "SQL injection in query builder",
      "root_cause": "Builder used string concatenation (didn't know pattern)",
      "prevention": "Playbook should document: always use parameterized queries",
      "applies_to": "database queries, API endpoints with user input"
    },
    {
      "issue": "Missing edge case tests for empty arrays",
      "root_cause": "Test Quality Agent found gap",
      "prevention": "Playbook should require: test null/empty/invalid for all inputs",
      "applies_to": "all data processing functions"
    }
  ],
  "playbook_proposal": {
    "action": "update_existing" | "create_new",
    "playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
    "module": "api/database",
    "updates": {
      "common_gotchas": [
        "Never concatenate user input into SQL - use parameterized queries",
        "Test edge cases: null, undefined, [], '', invalid input"
      ],
      "code_patterns": [
        "db.query(sql, [param1, param2]) ✓",
        "sql + userInput ✗"
      ],
      "test_requirements": [
        "Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
        "Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
      ],
      "related_stories": ["{{story_key}}"]
    }
  }
 }
 Save to: docs/sprint-artifacts/completions/{{story_key}}-reflection.json
 </completion_format>
 \`
 })
 \`\`\`
 **Wait for completion.**
 **Review playbook proposal:**
 \`\`\`bash
 REFLECTION="docs/sprint-artifacts/completions/{{story_key}}-reflection.json"
 ACTION=$(jq -r '.playbook_proposal.action' "$REFLECTION")
 PLAYBOOK=$(jq -r '.playbook_proposal.playbook' "$REFLECTION")
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 echo "📝 Playbook Update Proposal"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 echo "Action: $ACTION"
 echo "Playbook: $PLAYBOOK"
 echo ""
 jq -r '.learnings[] | "- \(.issue)\n  Prevention: \(.prevention)"' "$REFLECTION"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 \`\`\`
 If \`auto_apply_updates: true\` in config:
 - Read playbook (or create from template if new)
 - Use Edit tool to add learnings to sections
 - Commit playbook update
 If \`auto_apply_updates: false\` (default):
 - Display proposal for manual review
 - User can apply later with \`/update-playbooks {{story_key}}\`
 </step>
 </process>
 <failure_handling>
-**Builder fails:** Don't spawn verification. Report failure and halt.
+**Builder fails (Phase 1):** Don't spawn verification. Report failure and halt.
-**Inspector fails (Phase 2):** Still run Reviewers in parallel, collect all findings together.
+**Inspector fails (Phase 2):** Still collect other reviewer findings.
-**Inspector fails (Phase 4):** Resume Builder again with new issues (iterative fix loop).
+**Test Quality fails:** Add issues to Builder fix list.
-**Builder resume fails:** Report unfixed issues. Manual intervention needed.
+**Coverage below threshold:** Add to Builder fix list.
-**Reconciliation fails:** Fix using Edit tool. Re-verify checkboxes.
+**Reviewers find CRITICAL:** Builder MUST fix when resumed.
 **Inspector fails (Phase 4):** Resume Builder again (iterative loop, max 3 iterations).
 **Builder resume fails:** Report unfixed issues. Manual intervention.
 **Reconciliation fails:** Fix with Edit tool, re-verify.
 </failure_handling>
 <complexity_routing>
-| Complexity | Pipeline | Reviewers | Total Phase 2 Agents |
+| Complexity | Phase 2 Agents | Total | Security |
-|------------|----------|-----------|---------------------|
+|------------|----------------|-------|----------|
-| micro | Builder → [Inspector + 2 Reviewers] → Resume Builder → Inspector recheck | 2 (security, architect) | 3 agents |
+| micro | Inspector + Test Quality + 2 Reviewers | 4 agents | Security Reviewer + Architect |
-| standard | Builder → [Inspector + 3 Reviewers] → Resume Builder → Inspector recheck | 3 (security, logic, architect) | 4 agents |
+| standard | Inspector + Test Quality + 3 Reviewers | 5 agents | Security + Logic + Architect |
-| complex | Builder → [Inspector + 4 Reviewers] → Resume Builder → Inspector recheck | 4 (security, logic, architect, quality) | 5 agents |
+| complex | Inspector + Test Quality + 4 Reviewers | 6 agents | Security + Logic + Architect + Quality |
-**Key Improvements (v3.2.0):**
+**All verification agents spawn in parallel (single message)**
 - All verification agents spawn in parallel (single message, faster execution)
 - Builder resume in Phase 3 saves 50-70% tokens vs spawning fresh Fixer
 - **NEW:** Architect/Integration Reviewer catches runtime issues (404s, pattern violations, missing migrations)
 **Reviewer Specializations:**
 - **Security:** Auth, injection, secrets, cross-tenant access
 - **Logic/Performance:** Bugs, edge cases, N+1 queries, race conditions
 - **Architect/Integration:** Routes work, patterns match, migrations applied, dependencies installed (v3.2.0+)
 - **Code Quality:** Maintainability, naming, duplication (complex only)
 </complexity_routing>
 <success_criteria>
- [ ] Builder spawned and agent_id saved
+- [ ] Phase 0: Playbooks loaded (if available)
- [ ] All verification agents completed in parallel
+- [ ] Phase 1: Builder spawned, agent_id saved
- [ ] Builder resumed with consolidated findings
+- [ ] Phase 2: All verification agents completed in parallel
- [ ] Inspector recheck passed
+- [ ] Phase 2.5: Coverage gate passed
- [ ] Git commit exists for story
+- [ ] Phase 3: Builder resumed with consolidated findings
- [ ] Story file has checked tasks (count > 0)
+- [ ] Phase 4: Inspector recheck passed
- [ ] Dev Agent Record filled with all phases
+- [ ] Phase 5: Orchestrator reconciled with Inspector evidence
- [ ] Sprint status updated to "done"
+- [ ] Phase 6: Playbook reflection completed
 - [ ] Git commit exists
 - [ ] Story tasks checked with code citations
 - [ ] Dev Agent Record filled
 - [ ] Coverage ≥ {{coverage_threshold}}%
 - [ ] Sprint status: done
 </success_criteria>
 <improvements_v4>
 1. ✅ Resume Builder for fixes (v3.2+) - 50-70% token savings
 2. ✅ Inspector provides code citations (v4.0) - file:line evidence for every task
 3. ✅ Removed "hospital-grade" framing (v4.0) - kept disciplined gates
 4. ✅ Micro stories get 2 reviewers + security scan (v3.2+) - not zero
 5. ✅ Test Quality Agent (v4.0) + Coverage Gate (v4.0) - validates test quality and enforces threshold
 6. ✅ Playbook query (v4.0) before Builder + reflection (v4.0) after - continuous learning
 </improvements_v4>
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml
@ -1,7 +1,7 @@
 name: story-full-pipeline
-description: "Multi-agent pipeline with wave-based execution, independent validation, and adversarial code review (GSDMAD)"
+description: "Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes"
-author: "BMAD Method + GSD"
+author: "BMAD Method"
-version: "3.2.0" # Added architect-integration-reviewer for runtime verification
+version: "4.0.0" # Added playbook learning, test quality, coverage gates, Inspector code citations
 # Execution mode
 execution_mode: "multi_agent" # multi_agent | single_agent (fallback)
@ -37,13 +37,23 @@ agents:
    timeout: 3600 # 1 hour
  inspector:
-    description: "Validation agent - independent verification"
+    description: "Validation agent - independent verification with code citations"
    steps: [5, 6]
    subagent_type: "general-purpose"
    prompt_file: "{agents_path}/inspector.md"
    fresh_context: true # No knowledge of builder agent
    trust_level: "medium" # No conflict of interest
    timeout: 1800 # 30 minutes
    require_code_citations: true # v4.0: Must provide file:line evidence for all tasks
  test_quality:
    description: "Test quality validation - verifies test coverage and quality"
    steps: [5.5]
    subagent_type: "general-purpose"
    prompt_file: "{agents_path}/test-quality.md"
    fresh_context: true
    trust_level: "medium"
    timeout: 1200 # 20 minutes
  reviewer:
    description: "Adversarial code review - finds problems"
@ -73,15 +83,40 @@ agents:
    trust_level: "medium" # Incentive to minimize work
    timeout: 2400 # 40 minutes
  reflection:
    description: "Playbook learning - extracts patterns for future agents"
    steps: [10]
    subagent_type: "general-purpose"
    prompt_file: "{agents_path}/reflection.md"
    timeout: 900 # 15 minutes
 # Reconciliation: orchestrator does this directly (see workflow.md Phase 5)
 # Playbook configuration (v4.0)
 playbooks:
  enabled: true # Set to false in project config to disable
  directory: "docs/playbooks/implementation-playbooks"
  bootstrap_mode: true # Auto-initialize if missing
  max_load: 3
  auto_apply_updates: false # Require manual review of playbook updates
  discovery:
    enabled: true # Scan git/docs to populate initial playbooks
    sources: ["git_history", "docs", "existing_code"]
 # Quality gates (v4.0)
 quality_gates:
  coverage_threshold: 80 # % line coverage required
  task_verification: "all_with_evidence" # Inspector must provide file:line citations
  critical_issues: "must_fix"
  high_issues: "must_fix"
 # Complexity level (determines which steps to execute)
 complexity_level: "standard" # micro | standard | complex
 # Complexity routing
 complexity_routing:
  micro:
-    skip_agents: ["reviewer"] # Skip code review for micro stories
+    skip_agents: [] # Full pipeline (v4.0: micro gets security scan)
    description: "Lightweight path for low-risk stories"
    examples: ["UI tweaks", "text changes", "simple CRUD"]
--- a/src/bmm/workflows/templates/implementation-playbook-template.md
+++ b/src/bmm/workflows/templates/implementation-playbook-template.md
@ -0,0 +1,85 @@
 # {{Module/Feature Area}} - Implementation Playbook
 > **Purpose:** Guide future agents implementing features in {{module_name}}
 > **Created:** {{date}}
 > **Last Updated:** {{date}}
 ## Common Gotchas
 **What mistakes to avoid:**
 - Add specific gotchas here as they're discovered
 - Example: "Never concatenate user input into SQL queries"
 - Example: "Always validate file paths before operations"
 ## Code Patterns
 **Standard approaches that work:**
 ### Pattern: {{Pattern Name}}
 ✓ **Good:**
 ```
 // Example of correct pattern
 db.query(sql, [param1, param2])
 ```
 ✗ **Bad:**
 ```
 // Example of incorrect pattern
 sql + userInput
 ```
 ### Pattern: {{Another Pattern}}
 ✓ **Good:**
 ```
 // Another example
 if (!data) return null;
 ```
 ✗ **Bad:**
 ```
 // Don't do this
 data.map(...)  // crashes if data is null
 ```
 ## Test Requirements
 **Essential tests for this module:**
 - **Happy path:** Verify primary functionality
 - **Edge cases:** Test null, undefined, empty arrays, invalid inputs
 - **Error conditions:** Verify errors are handled properly
 - **Security:** Test for injection attacks, auth bypasses, etc.
 ### Example Test Pattern
 ```typescript
 describe('FeatureName', () => {
  it('handles happy path', () => {
    expect(fn(validInput)).toEqual(expected)
  })
  it('handles edge cases', () => {
    expect(fn(null)).toThrow()
    expect(fn([])).toEqual([])
  })
  it('validates security', () => {
    expect(fn("' OR 1=1--")).toThrow()
  })
 })
 ```
 ## Related Stories
 Stories that used these patterns:
 - {{story_key}} - {{brief description}}
 ## Notes
 - Keep this simple and actionable
 - Add new learnings as they emerge
 - Focus on preventable mistakes