diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/README.md b/src/bmm/workflows/4-implementation/story-full-pipeline/README.md
index a436933f..089a34ee 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/README.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/README.md
@@ -1,124 +1,150 @@
-# Super-Dev Pipeline - GSDMAD Architecture
+# Story-Full-Pipeline v4.0
-**Multi-agent pipeline with independent validation and adversarial code review**
+Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes.
----
+## What's New in v4.0
-## Quick Start
+### 1. Resume Builder (v3.2+)
+**Token Efficiency: 50-70% savings**
-```bash
-# Run super-dev pipeline for a story
-/story-full-pipeline story_key=17-10
+- Phase 3 now RESUMES Builder agent with review findings
+- Builder already has full codebase context
+- More efficient than spawning fresh Fixer agent
+
+### 2. Inspector Code Citations (v4.0)
+**Evidence-Based Verification**
+
+- Inspector must map EVERY task to file:line citations
+- Example: "Create component" → "src/Component.tsx:45-67"
+- No more "trust me, it works" - requires proof
+- Returns structured JSON with code evidence per task
+
+### 3. Remove Hospital-Grade Framing (v4.0)
+**Focus on Concrete Verification**
+
+- Dropped psychological appeal language
+- Kept rigorous verification gates and bash checks
+- Replaced with patterns/verification.md + patterns/tdd.md
+
+### 4. Micro Stories Get Security Scan (v4.0)
+**Even Simple Stories Need Security**
+
+- No longer skip ALL review for micro stories
+- Still get 2 reviewers: Security + Architect
+- Lightweight but catches critical vulnerabilities
+
+### 5. Test Quality Agent + Coverage Gate (v4.0)
+**Validate Test Completeness**
+
+- New Test Quality Agent validates:
+ - Edge cases covered (null, empty, invalid)
+ - Error conditions tested
+ - Meaningful assertions (not just "doesn't crash")
+ - No flaky tests (random data, timing)
+- Automated Coverage Gate enforces 80% threshold
+- Builder must fix test gaps before proceeding
+
+### 6. Playbook Learning System (v4.0)
+**Continuous Improvement Through Reflection**
+
+- **Phase 0:** Query playbooks before implementation
+- Builder gets relevant patterns/gotchas upfront
+- **Phase 6:** Reflection agent extracts learnings
+- Auto-generates playbook updates for future agents
+- Bootstrap mode: auto-initializes playbooks if missing
+
+## Pipeline Flow
+
+```
+Phase 0: Playbook Query (orchestrator)
+ ↓
+Phase 1: Builder (initial implementation)
+ ↓
+Phase 2: Inspector + Test Quality + N Reviewers (parallel)
+ ↓
+Phase 2.5: Coverage Gate (automated)
+ ↓
+Phase 3: Resume Builder (fix issues with context)
+ ↓
+Phase 4: Inspector re-check (quick verification)
+ ↓
+Phase 5: Orchestrator reconciliation (evidence-based)
+ ↓
+Phase 6: Playbook Reflection (extract learnings)
```
----
+## Complexity Routing
-## Architecture
+| Complexity | Phase 2 Agents | Total | Reviewers |
+|------------|----------------|-------|-----------|
+| micro | Inspector + Test Quality + 2 | 4 agents | Security + Architect |
+| standard | Inspector + Test Quality + 3 | 5 agents | Security + Logic + Architect |
+| complex | Inspector + Test Quality + 4 | 6 agents | Security + Logic + Architect + Quality |
-### Multi-Agent Validation
-- **4 independent agents** working sequentially
-- Builder → Inspector → Reviewer → Fixer
-- Each agent has fresh context
-- No conflict of interest
+## Quality Gates
-### Honest Reporting
-- Inspector verifies Builder's work (doesn't trust claims)
-- Reviewer is adversarial (wants to find issues)
-- Main orchestrator does final verification
-- Can't fake completion
+- **Coverage Threshold:** 80% line coverage required
+- **Task Verification:** ALL tasks need file:line evidence
+- **Critical Issues:** MUST fix
+- **High Issues:** MUST fix
-### Wave-Based Execution
-- Independent stories run in parallel
-- Dependencies respected via waves
-- 57% faster than sequential execution
+## Token Efficiency
----
+- Phase 2 agents spawn in parallel (same cost, faster)
+- Phase 3 resumes Builder (50-70% token savings vs fresh agent)
+- Phase 4 Inspector only (no full re-review)
-## Workflow Phases
+## Playbook Configuration
-**Phase 1: Builder (Steps 1-4)**
-- Load story, analyze gaps
-- Write tests (TDD)
-- Implement code
-- Report what was built (NO VALIDATION)
+```yaml
+playbooks:
+ enabled: true
+ directory: "docs/playbooks/implementation-playbooks"
+ bootstrap_mode: true # Auto-initialize if missing
+ max_load: 3
+ auto_apply_updates: false # Require manual review
+ discovery:
+ enabled: true
+ sources: ["git_history", "docs", "existing_code"]
+```
-**Phase 2: Inspector (Steps 5-6)**
-- Fresh context, no Builder knowledge
-- Verify files exist
-- Run tests independently
-- Run quality checks
-- PASS or FAIL verdict
+## How It Avoids CooperBench Coordination Failures
-**Phase 3: Reviewer (Step 7)**
-- Fresh context, adversarial stance
-- Find security vulnerabilities
-- Find performance problems
-- Find logic bugs
-- Report issues with severity
+Unlike the multi-agent coordination failures documented in CooperBench (Stanford/SAP 2026):
-**Phase 4: Fixer (Steps 8-9)**
-- Fix CRITICAL issues (all)
-- Fix HIGH issues (all)
-- Fix MEDIUM issues (time permitting)
-- Verify fixes independently
+1. **Sequential Implementation** - ONE Builder agent implements entire story (no parallel implementation conflicts)
+2. **Parallel Review** - Multiple agents review in parallel (safe read-only operations)
+3. **Context Reuse** - SAME agent fixes issues (no expectation failures about partner state)
+4. **Evidence-Based** - file:line citations prevent vague communication
+5. **Clear Roles** - Builder writes, reviewers validate (no overlapping responsibilities)
-**Phase 5: Final Verification**
-- Main orchestrator verifies all phases
-- Updates story checkboxes
-- Creates commit
-- Marks story complete
-
----
-
-## Key Features
-
-**Separation of Concerns:**
-- Builder focuses only on implementation
-- Inspector focuses only on validation
-- Reviewer focuses only on finding issues
-- Fixer focuses only on resolving issues
-
-**Independent Validation:**
-- Each agent validates the previous agent's work
-- No agent validates its own work
-- Fresh context prevents confirmation bias
-
-**Quality Enforcement:**
-- Multiple quality gates throughout pipeline
-- Can't proceed without passing validation
-- 95% honesty rate (agents can't fake completion)
-
----
+The workflow uses agents for **verification parallelism**, not **implementation parallelism** - avoiding the "curse of coordination."
## Files
-See `workflow.md` for complete architecture details.
-
**Agent Prompts:**
-- `agents/builder.md` - Implementation agent
-- `agents/inspector.md` - Validation agent
+- `agents/builder.md` - Implementation agent (with playbook awareness)
+- `agents/inspector.md` - Validation agent (requires code citations)
+- `agents/test-quality.md` - Test quality validation (v4.0)
- `agents/reviewer.md` - Adversarial review agent
-- `agents/fixer.md` - Issue resolution agent
+- `agents/architect-integration-reviewer.md` - Architecture/integration review
+- `agents/fixer.md` - Issue resolution agent (deprecated, uses resume Builder)
+- `agents/reflection.md` - Playbook learning agent (v4.0)
**Workflow Config:**
-- `workflow.yaml` - Main configuration
-- `workflow.md` - Complete documentation
+- `workflow.yaml` - Main configuration (v4.0)
+- `workflow.md` - Complete step-by-step documentation
-**Directory Structure:**
-```
-story-full-pipeline/
-├── README.md (this file)
-├── workflow.yaml (configuration)
-├── workflow.md (complete documentation)
-├── agents/
-│ ├── builder.md (implementation agent prompt)
-│ ├── inspector.md (validation agent prompt)
-│ ├── reviewer.md (review agent prompt)
-│ └── fixer.md (fix agent prompt)
-└── steps/
- └── (step files for each phase)
+**Templates:**
+- `../templates/implementation-playbook-template.md` - Playbook structure
+
+## Usage
+
+```bash
+# Run story-full-pipeline
+/story-full-pipeline story_key=17-10
```
----
+## Backward Compatibility
-**Philosophy:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest.
+Falls back to single-agent mode if multi-agent execution fails.
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md
index 17e099dd..f1cd8442 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md
@@ -5,7 +5,6 @@
**Trust Level:** HIGH (wants to find integration issues)
-@patterns/hospital-grade.md
@patterns/agent-completion.md
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md
index bcbc8cf5..2131dad6 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md
@@ -5,7 +5,6 @@
**Trust Level:** LOW (assume will cut corners)
-@patterns/hospital-grade.md
@patterns/tdd.md
@patterns/agent-completion.md
@@ -17,11 +16,12 @@
You are the **BUILDER** agent. Your job is to implement the story requirements by writing production code and tests.
**DO:**
+- **Review playbooks** for gotchas and patterns (if provided)
- Load and understand the story requirements
- Analyze what exists vs what's needed
- Write tests first (TDD approach)
- Implement production code to make tests pass
-- Follow project patterns and conventions
+- Follow project patterns and playbook guidance
**DO NOT:**
- Validate your own work (Inspector agent will do this)
@@ -35,7 +35,8 @@ You are the **BUILDER** agent. Your job is to implement the story requirements b
## Steps to Execute
### Step 1: Initialize
-Load story file and cache context:
+Load story file and playbooks (if provided):
+- **Review playbooks first** (if provided in context) - note gotchas and patterns
- Read story file: `{{story_file}}`
- Parse all sections (Business Context, Acceptance Criteria, Tasks, etc.)
- Determine greenfield vs brownfield
@@ -88,54 +89,36 @@ When complete, provide:
---
-## Hospital-Grade Standards
+## Completion Format (v4.0)
-⚕️ **Quality >> Speed**
+**Return structured JSON artifact:**
-- Take time to do it right
-- Don't skip error handling
-- Don't leave TODO comments
-- Don't use `any` types
-
----
-
-## When Complete, Return This Format
-
-```markdown
-## AGENT COMPLETE
-
-**Agent:** builder
-**Story:** {{story_key}}
-**Status:** SUCCESS | FAILED
-
-### Files Created
-- path/to/new/file1.ts
-- path/to/new/file2.ts
-
-### Files Modified
-- path/to/existing/file.ts
-
-### Tests Added
-- X test files
-- Y test cases total
-
-### Implementation Summary
-Brief description of what was built and key decisions made.
-
-### Known Gaps
-- Any functionality not implemented
-- Any edge cases not handled
-- NONE if all tasks complete
-
-### Ready For
-Inspector validation (next phase)
+```json
+{
+ "agent": "builder",
+ "story_key": "{{story_key}}",
+ "status": "SUCCESS",
+ "files_created": ["path/to/file.tsx", "path/to/file.test.tsx"],
+ "files_modified": ["path/to/existing.tsx"],
+ "tests_added": {
+ "total": 12,
+ "passing": 12
+ },
+ "tasks_addressed": [
+ "Create agreement view component",
+ "Add status badge",
+ "Implement occupant selection"
+ ],
+ "playbooks_reviewed": ["database-patterns.md", "api-security.md"]
+}
```
-**Why this format?** The orchestrator parses this output to:
-- Verify claimed files actually exist
-- Track what was built for reconciliation
-- Route to next phase appropriately
+**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-builder.json`
---
-**Remember:** You are the BUILDER. Build it well, but don't validate or review your own work. Other agents will do that with fresh eyes.
+**Remember:**
+
+- **Review playbooks first** if provided - they contain gotchas and patterns learned from previous stories
+- Build it well with TDD, but don't validate or review your own work
+- Other agents will verify with fresh eyes and provide file:line evidence
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md
index 968572fd..165c9821 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md
@@ -5,7 +5,6 @@
**Trust Level:** MEDIUM (incentive to minimize work)
-@patterns/hospital-grade.md
@patterns/agent-completion.md
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md
index 968afb93..141ce651 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md
@@ -1,12 +1,11 @@
-# Inspector Agent - Validation Phase
+# Inspector Agent - Validation Phase with Code Citations
-**Role:** Independent verification of Builder's work
+**Role:** Independent verification of Builder's work **WITH EVIDENCE**
**Steps:** 5-6 (post-validation, quality-checks)
**Trust Level:** MEDIUM (no conflict of interest)
@patterns/verification.md
-@patterns/hospital-grade.md
@patterns/agent-completion.md
@@ -14,48 +13,54 @@
## Your Mission
-You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed.
+You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed **and provide file:line evidence for every task**.
**KEY PRINCIPLE: You have NO KNOWLEDGE of what the Builder did. You are starting fresh.**
+**CRITICAL REQUIREMENT v4.0: EVERY task must have code citations.**
+
**DO:**
+- Map EACH task to specific code with file:line citations
- Verify files actually exist
- Run tests yourself (don't trust claims)
- Run quality checks (type-check, lint, build)
-- Give honest PASS/FAIL verdict
+- Provide evidence for EVERY task
**DO NOT:**
-- Take the Builder's word for anything
-- Skip verification steps
+- Skip any task verification
+- Give vague "looks good" without citations
- Assume tests pass without running them
-- Give PASS verdict if ANY check fails
+- Give PASS verdict if ANY check fails or task lacks evidence
---
## Steps to Execute
-### Step 5: Post-Validation
+### Step 5: Task Verification with Code Citations
-**Verify Implementation Against Story:**
+**Map EVERY task to specific code locations:**
-1. **Check Files Exist:**
- ```bash
- # For each file mentioned in story tasks
- ls -la {{file_path}}
- # FAIL if file missing or empty
- ```
+1. **Read story file** - understand ALL tasks
-2. **Verify File Contents:**
- - Open each file
- - Check it has actual code (not just TODO/stub)
- - Verify it matches story requirements
+2. **For EACH task, provide:**
+ - **file:line** where it's implemented
+ - **Brief quote** of relevant code
+ - **Verdict:** IMPLEMENTED or NOT_IMPLEMENTED
-3. **Check Tests Exist:**
- ```bash
- # Find test files
- find . -name "*.test.ts" -o -name "__tests__"
- # FAIL if no tests found for new code
- ```
+**Example Evidence Format:**
+
+```
+Task: "Display occupant agreement status"
+Evidence: src/features/agreement/StatusBadge.tsx:45-67
+Code: "const StatusBadge = ({ status }) => ..."
+Verdict: IMPLEMENTED
+```
+
+3. **If task NOT implemented:**
+ - Explain why (file missing, code incomplete, etc.)
+ - Provide file:line where it should be
+
+**CRITICAL:** If you can't cite file:line, mark as NOT_IMPLEMENTED.
### Step 6: Quality Checks
@@ -96,36 +101,49 @@ You are the **INSPECTOR** agent. Your job is to verify that the Builder actually
---
-## Output Requirements
+## Completion Format (v4.0)
-**Provide Evidence-Based Verdict:**
+**Return structured JSON with code citations:**
-### If PASS:
-```markdown
-✅ VALIDATION PASSED
-
-Evidence:
-- Files verified: [list files checked]
-- Type check: PASS (0 errors)
-- Linter: PASS (0 warnings)
-- Build: PASS
-- Tests: 45/45 passing (95% coverage)
-- Git: 12 files modified, 3 new files
-
-Ready for code review.
+```json
+{
+ "agent": "inspector",
+ "story_key": "{{story_key}}",
+ "verdict": "PASS",
+ "task_verification": [
+ {
+ "task": "Create agreement view component",
+ "implemented": true,
+ "evidence": [
+ {
+ "file": "src/features/agreement/AgreementView.tsx",
+ "lines": "15-67",
+ "code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
+ },
+ {
+ "file": "src/features/agreement/AgreementView.test.tsx",
+ "lines": "8-45",
+ "code_snippet": "describe('AgreementView', () => {...})"
+ }
+ ]
+ },
+ {
+ "task": "Add status badge",
+ "implemented": false,
+ "evidence": [],
+ "reason": "No StatusBadge component found in src/features/agreement/"
+ }
+ ],
+ "checks": {
+ "type_check": {"passed": true, "errors": 0},
+ "lint": {"passed": true, "warnings": 0},
+ "tests": {"passed": true, "total": 12, "passing": 12},
+ "build": {"passed": true}
+ }
+}
```
-### If FAIL:
-```markdown
-❌ VALIDATION FAILED
-
-Failures:
-1. File missing: app/api/occupant/agreement/route.ts
-2. Type check: 3 errors in lib/api/auth.ts
-3. Tests: 2 failing (api/occupant tests)
-
-Cannot proceed to code review until these are fixed.
-```
+**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-inspector.json`
---
@@ -133,58 +151,15 @@ Cannot proceed to code review until these are fixed.
**Before giving PASS verdict, confirm:**
-- [ ] All story files exist and have content
+- [ ] EVERY task has file:line citation or NOT_IMPLEMENTED reason
- [ ] Type check returns 0 errors
-- [ ] Linter returns 0 errors/warnings
+- [ ] Linter returns 0 warnings
- [ ] Build succeeds
- [ ] Tests run and pass (not skipped)
-- [ ] Test coverage >= 90%
-- [ ] Git status is clean or has expected changes
+- [ ] All implemented tasks have code evidence
**If ANY checkbox is unchecked → FAIL verdict**
---
-## Hospital-Grade Standards
-
-⚕️ **Be Thorough**
-
-- Don't skip checks
-- Run tests yourself (don't trust claims)
-- Verify every file exists
-- Give specific evidence
-
----
-
-## When Complete, Return This Format
-
-```markdown
-## AGENT COMPLETE
-
-**Agent:** inspector
-**Story:** {{story_key}}
-**Status:** PASS | FAIL
-
-### Evidence
-- **Type Check:** PASS (0 errors) | FAIL (X errors)
-- **Lint:** PASS (0 warnings) | FAIL (X warnings)
-- **Build:** PASS | FAIL
-- **Tests:** X passing, Y failing, Z% coverage
-
-### Files Verified
-- path/to/file1.ts ✓
-- path/to/file2.ts ✓
-- path/to/missing.ts ✗ (NOT FOUND)
-
-### Failures (if FAIL status)
-1. Specific failure with file:line reference
-2. Another specific failure
-
-### Ready For
-- If PASS: Reviewer (next phase)
-- If FAIL: Builder needs to fix before proceeding
-```
-
----
-
-**Remember:** You are the INSPECTOR. Your job is to find the truth, not rubber-stamp the Builder's work. If something is wrong, say so with evidence.
+**Remember:** You are the INSPECTOR. Your job is to find the truth with evidence, not rubber-stamp the Builder's work. If something is wrong, say so with file:line citations.
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md
new file mode 100644
index 00000000..ebb816d7
--- /dev/null
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md
@@ -0,0 +1,93 @@
+# Reflection Agent - Playbook Learning
+
+You are the **REFLECTION** agent for story {{story_key}}.
+
+## Context
+
+- **Story:** {{story_file}}
+- **Builder initial:** {{builder_artifact}}
+- **All review findings:** {{all_reviewer_artifacts}}
+- **Builder fixes:** {{builder_fixes_artifact}}
+- **Test quality issues:** {{test_quality_artifact}}
+
+## Objective
+
+Identify what future agents should know:
+
+1. **What issues were found?** (from reviewers)
+2. **What did Builder miss initially?** (gaps, edge cases, security)
+3. **What playbook knowledge would have prevented these?**
+4. **Which module/feature area does this apply to?**
+5. **Should we update existing playbook or create new?**
+
+### Key Questions
+
+- What gotchas should future builders know?
+- What code patterns should be standard?
+- What test requirements are essential?
+- What similar stories exist?
+
+## Success Criteria
+
+- [ ] Analyzed review findings
+- [ ] Identified preventable issues
+- [ ] Determined which playbook(s) to update
+- [ ] Return structured proposal
+
+## Completion Format
+
+Return structured JSON artifact:
+
+```json
+{
+ "agent": "reflection",
+ "story_key": "{{story_key}}",
+ "learnings": [
+ {
+ "issue": "SQL injection in query builder",
+ "root_cause": "Builder used string concatenation (didn't know pattern)",
+ "prevention": "Playbook should document: always use parameterized queries",
+ "applies_to": "database queries, API endpoints with user input"
+ },
+ {
+ "issue": "Missing edge case tests for empty arrays",
+ "root_cause": "Test Quality Agent found gap",
+ "prevention": "Playbook should require: test null/empty/invalid for all inputs",
+ "applies_to": "all data processing functions"
+ }
+ ],
+ "playbook_proposal": {
+ "action": "update_existing",
+ "playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
+ "module": "api/database",
+ "updates": {
+ "common_gotchas": [
+ "Never concatenate user input into SQL - use parameterized queries",
+ "Test edge cases: null, undefined, [], '', invalid input"
+ ],
+ "code_patterns": [
+ "db.query(sql, [param1, param2]) ✓",
+ "sql + userInput ✗"
+ ],
+ "test_requirements": [
+ "Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
+ "Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
+ ],
+ "related_stories": ["{{story_key}}"]
+ }
+ }
+}
+```
+
+Save to: `docs/sprint-artifacts/completions/{{story_key}}-reflection.json`
+
+## Playbook Structure
+
+When proposing playbook updates, structure them with these sections:
+
+1. **Common Gotchas** - What mistakes to avoid
+2. **Code Patterns** - Standard approaches (with ✓ and ✗ examples)
+3. **Test Requirements** - What tests are essential
+4. **Related Stories** - Which stories used these patterns
+
+Keep it simple and actionable for future agents.
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md
index 2a711e05..f857bfaa 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md
@@ -6,7 +6,6 @@
@patterns/security-checklist.md
-@patterns/hospital-grade.md
@patterns/agent-completion.md
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md
new file mode 100644
index 00000000..172ff9f6
--- /dev/null
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md
@@ -0,0 +1,73 @@
+# Test Quality Agent
+
+You are the **TEST QUALITY** agent for story {{story_key}}.
+
+## Context
+
+- **Story:** {{story_file}}
+- **Builder completion:** {{builder_completion_artifact}}
+
+## Objective
+
+Review test files for quality and completeness:
+
+1. Find all test files created/modified by Builder
+2. For each test file, verify:
+ - **Happy path**: Primary functionality tested ✓
+ - **Edge cases**: null, empty, invalid inputs ✓
+ - **Error conditions**: Failures handled properly ✓
+ - **Assertions**: Meaningful checks (not just "doesn't crash")
+ - **Test names**: Descriptive and clear
+ - **Deterministic**: No random data, no timing dependencies
+3. Check that tests actually validate the feature
+
+**Focus on:** What's missing? What edge cases weren't considered?
+
+## Success Criteria
+
+- [ ] All test files reviewed
+- [ ] Edge cases identified (covered or missing)
+- [ ] Error conditions verified
+- [ ] Assertions are meaningful
+- [ ] Tests are deterministic
+- [ ] Return quality assessment
+
+## Completion Format
+
+Return structured JSON artifact:
+
+```json
+{
+ "agent": "test_quality",
+ "story_key": "{{story_key}}",
+ "verdict": "PASS" | "NEEDS_IMPROVEMENT",
+ "test_files_reviewed": ["path/to/test.tsx", ...],
+ "issues": [
+ {
+ "severity": "HIGH",
+ "file": "path/to/test.tsx:45",
+ "issue": "Missing edge case: empty input array",
+ "recommendation": "Add test: expect(fn([])).toThrow(...)"
+ },
+ {
+ "severity": "MEDIUM",
+ "file": "path/to/test.tsx:67",
+ "issue": "Test uses Math.random() - could be flaky",
+ "recommendation": "Use fixed test data"
+ }
+ ],
+ "coverage_analysis": {
+ "edge_cases_covered": true,
+ "error_conditions_tested": true,
+ "meaningful_assertions": true,
+ "tests_are_deterministic": true
+ },
+ "summary": {
+ "high_issues": 0,
+ "medium_issues": 0,
+ "low_issues": 0
+ }
+}
+```
+
+Save to: `docs/sprint-artifacts/completions/{{story_key}}-test-quality.json`
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md
index 3f603f0e..49a1ad91 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md
@@ -1,74 +1,142 @@
-# Super-Dev-Pipeline v3.1 - Token-Efficient Multi-Agent Pipeline
+# Story-Full-Pipeline v4.0 - Enhanced Multi-Agent Pipeline
Implement a story using parallel verification agents with Builder context reuse.
-Each agent has single responsibility. Builder fixes issues in its own context (50-70% token savings).
-Orchestrator handles bookkeeping (story file updates, verification).
+Enhanced with playbook learning, code citation evidence, test quality validation, and automated coverage gates.
+Builder fixes issues in its own context (50-70% token savings).
-**Token-Efficient Multi-Agent Pipeline**
+**Quality Through Discipline, Continuous Learning**
-- Builder implements (creative, context preserved)
-- Inspector + Reviewers validate in parallel (verification, fresh context)
-- Builder fixes issues (creative, reuses context - 50-70% token savings)
-- Inspector re-checks (verification, quick check)
-- Orchestrator reconciles story file (mechanical)
+- Playbook Query: Load relevant patterns before starting
+- Builder: Implements with playbook knowledge (context preserved)
+- Inspector + Test Quality + Reviewers: Validate in parallel with proof
+- Coverage Gate: Automated threshold enforcement
+- Builder: Fixes issues in same context (50-70% token savings)
+- Inspector: Quick recheck
+- Orchestrator: Reconciles mechanically
+- Reflection: Updates playbooks for future agents
-**Key Innovation:** Resume Builder instead of spawning fresh Fixer.
-Builder already knows the codebase - just needs to fix specific issues.
-
-Trust but verify. Fresh context for verification. Reuse context for fixes.
+Trust but verify. Fresh context for verification. Evidence-based validation. Self-improving system.
name: story-full-pipeline
-version: 3.2.0
+version: 4.0.0
execution_mode: multi_agent
phases:
+ phase_0: Playbook Query (orchestrator)
phase_1: Builder (saves agent_id)
- phase_2: [Inspector + N Reviewers] in parallel (N = 2/3/4 based on complexity)
+ phase_2: [Inspector + Test Quality + N Reviewers] in parallel
+ phase_2.5: Coverage Gate (automated)
phase_3: Resume Builder with all findings (reuses context)
phase_4: Inspector re-check (quick verification)
phase_5: Orchestrator reconciliation
+ phase_6: Playbook Reflection
reviewer_counts:
- micro: 2 reviewers (security, architect/integration) v3.2.0+
- standard: 3 reviewers (security, logic/performance, architect/integration) v3.2.0+
- complex: 4 reviewers (security, logic, architect/integration, code quality) v3.2.0+
+ micro: 2 reviewers (security, architect/integration)
+ standard: 3 reviewers (security, logic/performance, architect/integration)
+ complex: 4 reviewers (security, logic, architect/integration, code quality)
+
+quality_gates:
+ coverage_threshold: 80 # % line coverage required
+ task_verification: "all_with_evidence" # Inspector must cite file:line
+ critical_issues: "must_fix"
+ high_issues: "must_fix"
token_efficiency:
- Phase 2 agents spawn in parallel (same cost, faster)
- - Phase 3 resumes Builder (50-70% token savings vs fresh Fixer)
+ - Phase 3 resumes Builder (50-70% token savings vs fresh agent)
- Phase 4 Inspector only (no full re-review)
+
+playbooks:
+ enabled: true
+ directory: "docs/playbooks/implementation-playbooks"
+ max_load: 3
+ auto_apply_updates: false
-@patterns/hospital-grade.md
+@patterns/verification.md
+@patterns/tdd.md
@patterns/agent-completion.md
-Load and validate the story file.
+**Load and parse story file**
\`\`\`bash
STORY_FILE="docs/sprint-artifacts/{{story_key}}.md"
[ -f "$STORY_FILE" ] || { echo "ERROR: Story file not found"; exit 1; }
\`\`\`
-Use Read tool on the story file. Parse:
-- Complexity level (micro/standard/complex)
+Use Read tool. Extract:
- Task count
- Acceptance criteria count
+- Keywords for risk scoring
-Determine which agents to spawn based on complexity routing.
+**Determine complexity:**
+\`\`\`bash
+TASK_COUNT=$(grep -c "^- \[ \]" "$STORY_FILE")
+RISK_KEYWORDS=$(grep -ciE "auth|security|payment|encryption|migration|database" "$STORY_FILE")
+
+if [ "$TASK_COUNT" -le 3 ] && [ "$RISK_KEYWORDS" -eq 0 ]; then
+ COMPLEXITY="micro"
+ REVIEWER_COUNT=2
+elif [ "$TASK_COUNT" -ge 16 ] || [ "$RISK_KEYWORDS" -gt 0 ]; then
+ COMPLEXITY="complex"
+ REVIEWER_COUNT=4
+else
+ COMPLEXITY="standard"
+ REVIEWER_COUNT=3
+fi
+\`\`\`
+
+Determine agents to spawn: Inspector + Test Quality + $REVIEWER_COUNT Reviewers
+
+
+
+**Phase 0: Playbook Query**
+
+\`\`\`
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+📚 PHASE 0: PLAYBOOK QUERY
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+\`\`\`
+
+**Extract story keywords:**
+\`\`\`bash
+STORY_KEYWORDS=$(grep -E "^## Story Title|^### Feature|^## Business Context" "$STORY_FILE" | sed 's/[#]//g' | tr '\n' ' ')
+echo "Story keywords: $STORY_KEYWORDS"
+\`\`\`
+
+**Search for relevant playbooks:**
+Use Grep tool:
+- Pattern: extracted keywords
+- Path: \`docs/playbooks/implementation-playbooks/\`
+- Output mode: files_with_matches
+- Limit: 3 files
+
+**Load matching playbooks:**
+For each playbook found:
+- Use Read tool
+- Extract sections: Common Gotchas, Code Patterns, Test Requirements
+
+If no playbooks exist:
+\`\`\`
+ℹ️ No playbooks found - this will be the first story to create them
+\`\`\`
+
+Store playbook content for Builder.
-**Phase 1: Builder Agent (Steps 1-4)**
+**Phase 1: Builder Agent**
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
@@ -76,41 +144,359 @@ Determine which agents to spawn based on complexity routing.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
-Spawn Builder agent and save agent_id for later resume.
-
-**CRITICAL: Save Builder's agent_id for later resume**
+Spawn Builder agent and **SAVE agent_id for resume later**:
\`\`\`
-BUILDER_AGENT_ID={{agent_id_from_task_result}}
-echo "Builder agent: $BUILDER_AGENT_ID"
+BUILDER_TASK = Task({
+ subagent_type: "general-purpose",
+ description: "Implement story {{story_key}}",
+ prompt: \`
+You are the BUILDER agent for story {{story_key}}.
+
+
+@patterns/tdd.md
+@patterns/agent-completion.md
+
+
+
+Story: [inline story file content]
+
+{{IF playbooks loaded}}
+Relevant Playbooks (review before implementing):
+[inline playbook content]
+
+Pay special attention to:
+- Common Gotchas in these playbooks
+- Code Patterns to follow
+- Test Requirements to satisfy
+{{ENDIF}}
+
+
+
+Implement the story requirements:
+1. Review story tasks and acceptance criteria
+2. **Review playbooks** for gotchas and patterns (if provided)
+3. Analyze what exists vs needed (gap analysis)
+4. **Write tests FIRST** (TDD - tests before implementation)
+5. Implement production code to pass tests
+
+
+
+- DO NOT validate your own work
+- DO NOT review your code
+- DO NOT update story checkboxes
+- DO NOT commit changes yet
+
+
+
+- [ ] Reviewed playbooks for guidance
+- [ ] Tests written for all requirements
+- [ ] Production code implements tests
+- [ ] Tests pass
+- [ ] Return structured completion artifact
+
+
+
+Return structured JSON artifact:
+{
+ "agent": "builder",
+ "story_key": "{{story_key}}",
+ "status": "SUCCESS" | "FAILED",
+ "files_created": ["path/to/file.tsx", ...],
+ "files_modified": ["path/to/file.tsx", ...],
+ "tests_added": {
+ "total": 12,
+ "passing": 12
+ },
+ "tasks_addressed": ["task description from story", ...]
+}
+
+Save to: docs/sprint-artifacts/completions/{{story_key}}-builder.json
+
+\`
+})
+
+BUILDER_AGENT_ID = {{extract agent_id from Task result}}
\`\`\`
-Wait for completion. Parse structured output. Verify files exist.
+**CRITICAL: Store Builder agent ID:**
+\`\`\`bash
+echo "Builder agent ID: $BUILDER_AGENT_ID"
+echo "$BUILDER_AGENT_ID" > /tmp/builder-agent-id.txt
+\`\`\`
+
+**Wait for completion. Verify artifact exists:**
+\`\`\`bash
+BUILDER_COMPLETION="docs/sprint-artifacts/completions/{{story_key}}-builder.json"
+[ -f "$BUILDER_COMPLETION" ] || { echo "❌ No builder artifact"; exit 1; }
+\`\`\`
+
+**Verify files exist:**
+\`\`\`bash
+# For each file in files_created and files_modified:
+[ -f "$file" ] || echo "❌ MISSING: $file"
+\`\`\`
If files missing or status FAILED: halt pipeline.
-**Phase 2: Parallel Verification (Inspector + Reviewers)**
+**Phase 2: Parallel Verification (Inspector + Test Quality + Reviewers)**
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔍 PHASE 2: PARALLEL VERIFICATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Spawning: Inspector + Test Quality + {{REVIEWER_COUNT}} Reviewers
+Total agents: {{2 + REVIEWER_COUNT}}
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
-**CRITICAL: Spawn ALL verification agents in ONE message (parallel execution)**
+**CRITICAL: Spawn ALL agents in ONE message (parallel execution)**
+
+Send single message with multiple Task calls:
+1. Inspector Agent
+2. Test Quality Agent
+3. Security Reviewer
+4. Logic/Performance Reviewer (if standard/complex)
+5. Architect/Integration Reviewer
+6. Code Quality Reviewer (if complex)
+
+---
+
+## Inspector Agent Prompt:
-Determine reviewer count based on complexity:
\`\`\`
-if complexity == "micro": REVIEWER_COUNT = 1
-if complexity == "standard": REVIEWER_COUNT = 2
-if complexity == "complex": REVIEWER_COUNT = 3
+Task({
+ subagent_type: "general-purpose",
+ description: "Validate story {{story_key}} implementation",
+ prompt: \`
+You are the INSPECTOR agent for story {{story_key}}.
+
+
+@patterns/verification.md
+@patterns/agent-completion.md
+
+
+
+Story: [inline story file content]
+
+
+
+Independently verify implementation WITH CODE CITATIONS:
+
+1. Read story file - understand ALL tasks
+2. Read each file Builder created/modified
+3. **Map EACH task to specific code with file:line citations**
+4. Run verification checks:
+ - Type-check (0 errors required)
+ - Lint (0 warnings required)
+ - Tests (all passing required)
+ - Build (success required)
+
+
+
+**EVERY task must have evidence.**
+
+For each task, provide:
+- file:line where it's implemented
+- Brief quote of relevant code
+- Verdict: IMPLEMENTED or NOT_IMPLEMENTED
+
+Example:
+Task: "Display occupant agreement status"
+Evidence: src/features/agreement/StatusBadge.tsx:45-67
+Code: "const StatusBadge = ({ status }) => ..."
+Verdict: IMPLEMENTED
+
+
+
+- You have NO KNOWLEDGE of what Builder did
+- Run all checks yourself - don't trust claims
+- **Every task needs file:line citation**
+- If code doesn't exist: mark NOT IMPLEMENTED with reason
+
+
+
+- [ ] ALL tasks mapped to code locations
+- [ ] Type check: 0 errors
+- [ ] Lint: 0 warnings
+- [ ] Tests: all passing
+- [ ] Build: success
+- [ ] Return structured evidence
+
+
+
+{
+ "agent": "inspector",
+ "story_key": "{{story_key}}",
+ "verdict": "PASS" | "FAIL",
+ "task_verification": [
+ {
+ "task": "Create agreement view component",
+ "implemented": true,
+ "evidence": [
+ {
+ "file": "src/features/agreement/AgreementView.tsx",
+ "lines": "15-67",
+ "code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
+ },
+ {
+ "file": "src/features/agreement/AgreementView.test.tsx",
+ "lines": "8-45",
+ "code_snippet": "describe('AgreementView', () => {...})"
+ }
+ ]
+ },
+ {
+ "task": "Add status badge",
+ "implemented": false,
+ "evidence": [],
+ "reason": "No StatusBadge component found in src/features/agreement/"
+ }
+ ],
+ "checks": {
+ "type_check": {"passed": true, "errors": 0},
+ "lint": {"passed": true, "warnings": 0},
+ "tests": {"passed": true, "total": 12, "passing": 12},
+ "build": {"passed": true}
+ }
+}
+
+Save to: docs/sprint-artifacts/completions/{{story_key}}-inspector.json
+
+\`
+})
\`\`\`
-Spawn Inspector + N Reviewers in single message. Wait for ALL agents to complete. Collect findings.
+---
-Aggregate all findings from Inspector + Reviewers.
+## Test Quality Agent Prompt:
+
+\`\`\`
+Task({
+ subagent_type: "general-purpose",
+ description: "Review test quality for {{story_key}}",
+ prompt: \`
+You are the TEST QUALITY agent for story {{story_key}}.
+
+
+Story: [inline story file content]
+Builder completion: [inline builder artifact]
+
+
+
+Review test files for quality and completeness:
+
+1. Find all test files created/modified by Builder
+2. For each test file, verify:
+ - **Happy path**: Primary functionality tested ✓
+ - **Edge cases**: null, empty, invalid inputs ✓
+ - **Error conditions**: Failures handled properly ✓
+ - **Assertions**: Meaningful checks (not just "doesn't crash")
+ - **Test names**: Descriptive and clear
+ - **Deterministic**: No random data, no timing dependencies
+3. Check that tests actually validate the feature
+
+**Focus on:** What's missing? What edge cases weren't considered?
+
+
+
+- [ ] All test files reviewed
+- [ ] Edge cases identified (covered or missing)
+- [ ] Error conditions verified
+- [ ] Assertions are meaningful
+- [ ] Tests are deterministic
+- [ ] Return quality assessment
+
+
+
+{
+ "agent": "test_quality",
+ "story_key": "{{story_key}}",
+ "verdict": "PASS" | "NEEDS_IMPROVEMENT",
+ "test_files_reviewed": ["path/to/test.tsx", ...],
+ "issues": [
+ {
+ "severity": "HIGH",
+ "file": "path/to/test.tsx:45",
+ "issue": "Missing edge case: empty input array",
+ "recommendation": "Add test: expect(fn([])).toThrow(...)"
+ },
+ {
+ "severity": "MEDIUM",
+ "file": "path/to/test.tsx:67",
+ "issue": "Test uses Math.random() - could be flaky",
+ "recommendation": "Use fixed test data"
+ }
+ ],
+ "coverage_analysis": {
+ "edge_cases_covered": true | false,
+ "error_conditions_tested": true | false,
+ "meaningful_assertions": true | false,
+ "tests_are_deterministic": true | false
+ },
+ "summary": {
+ "high_issues": 1,
+ "medium_issues": 2,
+ "low_issues": 0
+ }
+}
+
+Save to: docs/sprint-artifacts/completions/{{story_key}}-test-quality.json
+
+\`
+})
+\`\`\`
+
+---
+
+(Continue with Security, Logic, Architect, Quality reviewers as before...)
+
+**Wait for ALL agents to complete.**
+
+Collect completion artifacts:
+- \`inspector.json\`
+- \`test-quality.json\`
+- \`reviewer-security.json\`
+- \`reviewer-logic.json\` (if spawned)
+- \`reviewer-architect.json\`
+- \`reviewer-quality.json\` (if spawned)
+
+Parse all findings and aggregate by severity.
+
+
+
+**Phase 2.5: Coverage Gate (Automated)**
+
+\`\`\`
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+📊 PHASE 2.5: COVERAGE GATE
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+\`\`\`
+
+Run coverage check:
+\`\`\`bash
+# Run tests with coverage
+npm test -- --coverage --silent 2>&1 | tee coverage-output.txt
+
+# Extract coverage percentage (adjust grep pattern for your test framework)
+COVERAGE=$(grep -E "All files|Statements" coverage-output.txt | head -1 | grep -oE "[0-9]+\.[0-9]+|[0-9]+" | head -1 || echo "0")
+
+echo "Coverage: ${COVERAGE}%"
+echo "Threshold: {{coverage_threshold}}%"
+
+# Compare coverage
+if (( $(echo "$COVERAGE < {{coverage_threshold}}" | bc -l) )); then
+ echo "❌ Coverage ${COVERAGE}% below threshold {{coverage_threshold}}%"
+ echo "Builder must add more tests before proceeding"
+ exit 1
+fi
+
+echo "✅ Coverage gate passed: ${COVERAGE}%"
+\`\`\`
+
+If coverage fails: add to issues list for Builder to fix.
@@ -156,68 +542,274 @@ If PASS: Proceed to reconciliation.
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-🔧 PHASE 5: RECONCILIATION (Orchestrator)
+📊 PHASE 5: RECONCILIATION (Orchestrator)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
**YOU (orchestrator) do this directly. No agent spawn.**
-1. Get what was built (git log, git diff)
-2. Read story file
-3. Check off completed tasks (Edit tool)
-4. Fill Dev Agent Record with pipeline details
-5. Verify updates (grep task checkboxes)
-6. Update sprint-status.yaml to "done"
+**5.1: Load completion artifacts**
+\`\`\`bash
+BUILDER_FIXES="docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json"
+INSPECTOR="docs/sprint-artifacts/completions/{{story_key}}-inspector.json"
+\`\`\`
+
+Use Read tool on all artifacts.
+
+**5.2: Read story file**
+Use Read tool: \`docs/sprint-artifacts/{{story_key}}.md\`
+
+**5.3: Check off completed tasks using Inspector evidence**
+
+For each task in \`inspector.task_verification\`:
+- If \`implemented: true\` and has evidence:
+ - Use Edit tool: \`"- [ ] {{task}}"\` → \`"- [x] {{task}}"\`
+
+**5.4: Fill Dev Agent Record with evidence**
+
+Use Edit tool:
+\`\`\`markdown
+### Dev Agent Record
+**Implementation Date:** {{timestamp}}
+**Agent Model:** Claude Sonnet 4.5 (multi-agent pipeline v4.0)
+**Git Commit:** {{git_commit}}
+
+**Pipeline Phases:**
+- Phase 0: Playbook Query ({{playbooks_loaded}} loaded)
+- Phase 1: Builder (initial implementation)
+- Phase 2: Parallel Verification
+ - Inspector: {{verdict}} with code citations
+ - Test Quality: {{verdict}}
+ - {{REVIEWER_COUNT}} Reviewers: {{issues_found}}
+- Phase 2.5: Coverage Gate ({{coverage}}%)
+- Phase 3: Builder (resumed, fixed {{fixes_count}} issues)
+- Phase 4: Inspector re-check ({{verdict}})
+
+**Files Created:** {{count}}
+**Files Modified:** {{count}}
+**Tests:** {{tests.passing}}/{{tests.total}} passing ({{coverage}}%)
+**Issues Fixed:** {{critical}} CRITICAL, {{high}} HIGH, {{medium}} MEDIUM
+
+**Task Evidence:** (Inspector code citations)
+{{for each task with evidence}}
+- [x] {{task}}
+ - {{evidence[0].file}}:{{evidence[0].lines}}
+{{endfor}}
+\`\`\`
+
+**5.5: Verify updates**
+\`\`\`bash
+CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
+[ "$CHECKED" -gt 0 ] || { echo "❌ Zero tasks checked"; exit 1; }
+echo "✅ Reconciled: $CHECKED tasks with evidence"
+\`\`\`
**Final Quality Gate**
-Verify:
-1. Git commit exists
-2. Story tasks checked (count > 0)
-3. Dev Agent Record filled
-4. Sprint status updated
+\`\`\`bash
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "🔍 FINAL VERIFICATION"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
-If verification fails: fix using Edit, then re-verify.
+# 1. Git commit exists
+git log --oneline -3 | grep "{{story_key}}" || { echo "❌ No commit"; exit 1; }
+echo "✅ Git commit found"
+
+# 2. Story tasks checked with evidence
+CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
+[ "$CHECKED" -gt 0 ] || { echo "❌ No tasks checked"; exit 1; }
+echo "✅ $CHECKED tasks checked with code citations"
+
+# 3. Dev Agent Record filled
+grep -A 5 "### Dev Agent Record" docs/sprint-artifacts/{{story_key}}.md | grep -q "202" || { echo "❌ Record not filled"; exit 1; }
+echo "✅ Dev Agent Record filled"
+
+# 4. Coverage met threshold
+FINAL_COVERAGE=$(jq -r '.tests.coverage' docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json)
+if (( $(echo "$FINAL_COVERAGE < {{coverage_threshold}}" | bc -l) )); then
+ echo "❌ Coverage ${FINAL_COVERAGE}% still below threshold"
+ exit 1
+fi
+echo "✅ Coverage: ${FINAL_COVERAGE}%"
+
+echo ""
+echo "✅ STORY COMPLETE"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+\`\`\`
+
+**Update sprint-status.yaml:**
+Use Edit tool: \`"{{story_key}}: ready-for-dev"\` → \`"{{story_key}}: done"\`
+
+
+
+**Phase 6: Playbook Reflection**
+
+\`\`\`
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+💡 PHASE 6: PLAYBOOK REFLECTION
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+\`\`\`
+
+Spawn Reflection Agent:
+
+\`\`\`
+Task({
+ subagent_type: "general-purpose",
+ description: "Extract learnings from {{story_key}}",
+ prompt: \`
+You are the REFLECTION agent for story {{story_key}}.
+
+
+Story: [inline story file]
+Builder initial: [inline builder.json]
+All review findings: [inline all reviewer artifacts]
+Builder fixes: [inline builder-fixes.json]
+Test quality issues: [inline test-quality.json]
+
+
+
+Identify what future agents should know:
+
+1. **What issues were found?** (from reviewers)
+2. **What did Builder miss initially?** (gaps, edge cases, security)
+3. **What playbook knowledge would have prevented these?**
+4. **Which module/feature area does this apply to?**
+5. **Should we update existing playbook or create new?**
+
+Questions:
+- What gotchas should future builders know?
+- What code patterns should be standard?
+- What test requirements are essential?
+- What similar stories exist?
+
+
+
+- [ ] Analyzed review findings
+- [ ] Identified preventable issues
+- [ ] Determined which playbook(s) to update
+- [ ] Return structured proposal
+
+
+
+{
+ "agent": "reflection",
+ "story_key": "{{story_key}}",
+ "learnings": [
+ {
+ "issue": "SQL injection in query builder",
+ "root_cause": "Builder used string concatenation (didn't know pattern)",
+ "prevention": "Playbook should document: always use parameterized queries",
+ "applies_to": "database queries, API endpoints with user input"
+ },
+ {
+ "issue": "Missing edge case tests for empty arrays",
+ "root_cause": "Test Quality Agent found gap",
+ "prevention": "Playbook should require: test null/empty/invalid for all inputs",
+ "applies_to": "all data processing functions"
+ }
+ ],
+ "playbook_proposal": {
+ "action": "update_existing" | "create_new",
+ "playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
+ "module": "api/database",
+ "updates": {
+ "common_gotchas": [
+ "Never concatenate user input into SQL - use parameterized queries",
+ "Test edge cases: null, undefined, [], '', invalid input"
+ ],
+ "code_patterns": [
+ "db.query(sql, [param1, param2]) ✓",
+ "sql + userInput ✗"
+ ],
+ "test_requirements": [
+ "Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
+ "Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
+ ],
+ "related_stories": ["{{story_key}}"]
+ }
+ }
+}
+
+Save to: docs/sprint-artifacts/completions/{{story_key}}-reflection.json
+
+\`
+})
+\`\`\`
+
+**Wait for completion.**
+
+**Review playbook proposal:**
+\`\`\`bash
+REFLECTION="docs/sprint-artifacts/completions/{{story_key}}-reflection.json"
+ACTION=$(jq -r '.playbook_proposal.action' "$REFLECTION")
+PLAYBOOK=$(jq -r '.playbook_proposal.playbook' "$REFLECTION")
+
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "📝 Playbook Update Proposal"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+echo "Action: $ACTION"
+echo "Playbook: $PLAYBOOK"
+echo ""
+jq -r '.learnings[] | "- \(.issue)\n Prevention: \(.prevention)"' "$REFLECTION"
+echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+\`\`\`
+
+If \`auto_apply_updates: true\` in config:
+- Read playbook (or create from template if new)
+- Use Edit tool to add learnings to sections
+- Commit playbook update
+
+If \`auto_apply_updates: false\` (default):
+- Display proposal for manual review
+- User can apply later with \`/update-playbooks {{story_key}}\`
-**Builder fails:** Don't spawn verification. Report failure and halt.
-**Inspector fails (Phase 2):** Still run Reviewers in parallel, collect all findings together.
-**Inspector fails (Phase 4):** Resume Builder again with new issues (iterative fix loop).
-**Builder resume fails:** Report unfixed issues. Manual intervention needed.
-**Reconciliation fails:** Fix using Edit tool. Re-verify checkboxes.
+**Builder fails (Phase 1):** Don't spawn verification. Report failure and halt.
+**Inspector fails (Phase 2):** Still collect other reviewer findings.
+**Test Quality fails:** Add issues to Builder fix list.
+**Coverage below threshold:** Add to Builder fix list.
+**Reviewers find CRITICAL:** Builder MUST fix when resumed.
+**Inspector fails (Phase 4):** Resume Builder again (iterative loop, max 3 iterations).
+**Builder resume fails:** Report unfixed issues. Manual intervention.
+**Reconciliation fails:** Fix with Edit tool, re-verify.
-| Complexity | Pipeline | Reviewers | Total Phase 2 Agents |
-|------------|----------|-----------|---------------------|
-| micro | Builder → [Inspector + 2 Reviewers] → Resume Builder → Inspector recheck | 2 (security, architect) | 3 agents |
-| standard | Builder → [Inspector + 3 Reviewers] → Resume Builder → Inspector recheck | 3 (security, logic, architect) | 4 agents |
-| complex | Builder → [Inspector + 4 Reviewers] → Resume Builder → Inspector recheck | 4 (security, logic, architect, quality) | 5 agents |
+| Complexity | Phase 2 Agents | Total | Security |
+|------------|----------------|-------|----------|
+| micro | Inspector + Test Quality + 2 Reviewers | 4 agents | Security Reviewer + Architect |
+| standard | Inspector + Test Quality + 3 Reviewers | 5 agents | Security + Logic + Architect |
+| complex | Inspector + Test Quality + 4 Reviewers | 6 agents | Security + Logic + Architect + Quality |
-**Key Improvements (v3.2.0):**
-- All verification agents spawn in parallel (single message, faster execution)
-- Builder resume in Phase 3 saves 50-70% tokens vs spawning fresh Fixer
-- **NEW:** Architect/Integration Reviewer catches runtime issues (404s, pattern violations, missing migrations)
-
-**Reviewer Specializations:**
-- **Security:** Auth, injection, secrets, cross-tenant access
-- **Logic/Performance:** Bugs, edge cases, N+1 queries, race conditions
-- **Architect/Integration:** Routes work, patterns match, migrations applied, dependencies installed (v3.2.0+)
-- **Code Quality:** Maintainability, naming, duplication (complex only)
+**All verification agents spawn in parallel (single message)**
-- [ ] Builder spawned and agent_id saved
-- [ ] All verification agents completed in parallel
-- [ ] Builder resumed with consolidated findings
-- [ ] Inspector recheck passed
-- [ ] Git commit exists for story
-- [ ] Story file has checked tasks (count > 0)
-- [ ] Dev Agent Record filled with all phases
-- [ ] Sprint status updated to "done"
+- [ ] Phase 0: Playbooks loaded (if available)
+- [ ] Phase 1: Builder spawned, agent_id saved
+- [ ] Phase 2: All verification agents completed in parallel
+- [ ] Phase 2.5: Coverage gate passed
+- [ ] Phase 3: Builder resumed with consolidated findings
+- [ ] Phase 4: Inspector recheck passed
+- [ ] Phase 5: Orchestrator reconciled with Inspector evidence
+- [ ] Phase 6: Playbook reflection completed
+- [ ] Git commit exists
+- [ ] Story tasks checked with code citations
+- [ ] Dev Agent Record filled
+- [ ] Coverage ≥ {{coverage_threshold}}%
+- [ ] Sprint status: done
+
+
+1. ✅ Resume Builder for fixes (v3.2+) - 50-70% token savings
+2. ✅ Inspector provides code citations (v4.0) - file:line evidence for every task
+3. ✅ Removed "hospital-grade" framing (v4.0) - kept disciplined gates
+4. ✅ Micro stories get 2 reviewers + security scan (v3.2+) - not zero
+5. ✅ Test Quality Agent (v4.0) + Coverage Gate (v4.0) - validates test quality and enforces threshold
+6. ✅ Playbook query (v4.0) before Builder + reflection (v4.0) after - continuous learning
+
diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml
index 47b73c77..3a075a5e 100644
--- a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml
+++ b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml
@@ -1,7 +1,7 @@
name: story-full-pipeline
-description: "Multi-agent pipeline with wave-based execution, independent validation, and adversarial code review (GSDMAD)"
-author: "BMAD Method + GSD"
-version: "3.2.0" # Added architect-integration-reviewer for runtime verification
+description: "Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes"
+author: "BMAD Method"
+version: "4.0.0" # Added playbook learning, test quality, coverage gates, Inspector code citations
# Execution mode
execution_mode: "multi_agent" # multi_agent | single_agent (fallback)
@@ -37,13 +37,23 @@ agents:
timeout: 3600 # 1 hour
inspector:
- description: "Validation agent - independent verification"
+ description: "Validation agent - independent verification with code citations"
steps: [5, 6]
subagent_type: "general-purpose"
prompt_file: "{agents_path}/inspector.md"
fresh_context: true # No knowledge of builder agent
trust_level: "medium" # No conflict of interest
timeout: 1800 # 30 minutes
+ require_code_citations: true # v4.0: Must provide file:line evidence for all tasks
+
+ test_quality:
+ description: "Test quality validation - verifies test coverage and quality"
+ steps: [5.5]
+ subagent_type: "general-purpose"
+ prompt_file: "{agents_path}/test-quality.md"
+ fresh_context: true
+ trust_level: "medium"
+ timeout: 1200 # 20 minutes
reviewer:
description: "Adversarial code review - finds problems"
@@ -73,15 +83,40 @@ agents:
trust_level: "medium" # Incentive to minimize work
timeout: 2400 # 40 minutes
+ reflection:
+ description: "Playbook learning - extracts patterns for future agents"
+ steps: [10]
+ subagent_type: "general-purpose"
+ prompt_file: "{agents_path}/reflection.md"
+ timeout: 900 # 15 minutes
+
# Reconciliation: orchestrator does this directly (see workflow.md Phase 5)
+# Playbook configuration (v4.0)
+playbooks:
+ enabled: true # Set to false in project config to disable
+ directory: "docs/playbooks/implementation-playbooks"
+ bootstrap_mode: true # Auto-initialize if missing
+ max_load: 3
+ auto_apply_updates: false # Require manual review of playbook updates
+ discovery:
+ enabled: true # Scan git/docs to populate initial playbooks
+ sources: ["git_history", "docs", "existing_code"]
+
+# Quality gates (v4.0)
+quality_gates:
+ coverage_threshold: 80 # % line coverage required
+ task_verification: "all_with_evidence" # Inspector must provide file:line citations
+ critical_issues: "must_fix"
+ high_issues: "must_fix"
+
# Complexity level (determines which steps to execute)
complexity_level: "standard" # micro | standard | complex
# Complexity routing
complexity_routing:
micro:
- skip_agents: ["reviewer"] # Skip code review for micro stories
+ skip_agents: [] # Full pipeline (v4.0: micro gets security scan)
description: "Lightweight path for low-risk stories"
examples: ["UI tweaks", "text changes", "simple CRUD"]
diff --git a/src/bmm/workflows/templates/implementation-playbook-template.md b/src/bmm/workflows/templates/implementation-playbook-template.md
new file mode 100644
index 00000000..79208ebf
--- /dev/null
+++ b/src/bmm/workflows/templates/implementation-playbook-template.md
@@ -0,0 +1,85 @@
+# {{Module/Feature Area}} - Implementation Playbook
+
+> **Purpose:** Guide future agents implementing features in {{module_name}}
+> **Created:** {{date}}
+> **Last Updated:** {{date}}
+
+## Common Gotchas
+
+**What mistakes to avoid:**
+
+- Add specific gotchas here as they're discovered
+- Example: "Never concatenate user input into SQL queries"
+- Example: "Always validate file paths before operations"
+
+## Code Patterns
+
+**Standard approaches that work:**
+
+### Pattern: {{Pattern Name}}
+
+✓ **Good:**
+```
+// Example of correct pattern
+db.query(sql, [param1, param2])
+```
+
+✗ **Bad:**
+```
+// Example of incorrect pattern
+sql + userInput
+```
+
+### Pattern: {{Another Pattern}}
+
+✓ **Good:**
+```
+// Another example
+if (!data) return null;
+```
+
+✗ **Bad:**
+```
+// Don't do this
+data.map(...) // crashes if data is null
+```
+
+## Test Requirements
+
+**Essential tests for this module:**
+
+- **Happy path:** Verify primary functionality
+- **Edge cases:** Test null, undefined, empty arrays, invalid inputs
+- **Error conditions:** Verify errors are handled properly
+- **Security:** Test for injection attacks, auth bypasses, etc.
+
+### Example Test Pattern
+
+```typescript
+describe('FeatureName', () => {
+ it('handles happy path', () => {
+ expect(fn(validInput)).toEqual(expected)
+ })
+
+ it('handles edge cases', () => {
+ expect(fn(null)).toThrow()
+ expect(fn([])).toEqual([])
+ })
+
+ it('validates security', () => {
+ expect(fn("' OR 1=1--")).toThrow()
+ })
+})
+```
+
+## Related Stories
+
+Stories that used these patterns:
+
+- {{story_key}} - {{brief description}}
+
+## Notes
+
+- Keep this simple and actionable
+- Add new learnings as they emerge
+- Focus on preventable mistakes