feat: upgrade story-full-pipeline to v4.0 with 6 major enhancements

Upgrade from v3.2.0 to v4.0.0 with improvements inspired by CooperBench research
(Stanford/SAP 2026) on agent coordination failures.

Enhancement 1: Resume Builder (v3.2+)
- Phase 3 RESUMES Builder agent with review findings
- Builder already has full codebase context (50-70% token savings)
- More efficient than spawning fresh Fixer agent

Enhancement 2: Inspector Code Citations (v4.0)
- Inspector must map EVERY task to file:line citations
- Example: "Create component" → "src/Component.tsx:45-67"
- No more "trust me, it works" - requires proof
- Returns structured JSON with code evidence per task
- Prevents vague communication (CooperBench finding)

Enhancement 3: Remove Hospital-Grade Framing (v4.0)
- Dropped psychological appeal language
- Kept rigorous verification gates and bash checks
- Focus on concrete, measurable verification
- Replaced with patterns/verification.md + patterns/tdd.md

Enhancement 4: Micro Stories Get Security Scan (v4.0)
- No longer skip ALL review for micro stories
- Micro now gets 2 reviewers: Security + Architect
- Lightweight but still catches critical vulnerabilities

Enhancement 5: Test Quality Agent + Coverage Gate (v4.0)
- New Test Quality Agent validates:
  - Edge cases covered (null, empty, invalid)
  - Error conditions tested
  - Meaningful assertions (not just "doesn't crash")
  - No flaky tests (random data, timing)
- Automated Coverage Gate enforces 80% threshold
- Builder must fix test gaps before proceeding

Enhancement 6: Playbook Learning System (v4.0)
- Phase 0: Query playbooks before implementation
- Builder gets relevant patterns/gotchas upfront
- Phase 6: Reflection agent extracts learnings
- Auto-generates playbook updates for future agents
- Bootstrap mode: auto-initializes playbooks if missing
- Continuous improvement through reflection

Pipeline: Phase 0 (Playbooks) → Phase 1 (Builder) → Phase 2 (Inspector +
Test Quality + Reviewers parallel) → Phase 2.5 (Coverage Gate) → Phase 3
(Resume Builder) → Phase 4 (Inspector recheck) → Phase 5 (Reconciliation) →
Phase 6 (Reflection)

Files Modified:
- workflow.yaml: v4.0 config with playbooks + quality_gates
- workflow.md: Complete v4.0 documentation with all phases
- agents/builder.md: Playbook awareness + structured JSON
- agents/inspector.md: Code citation requirements + evidence format
- agents/reviewer.md: Remove hospital-grade reference
- agents/architect-integration-reviewer.md: Remove hospital-grade reference
- agents/fixer.md: Remove hospital-grade reference
- README.md: v4.0 documentation + CooperBench analysis

Files Created:
- agents/test-quality.md: Test quality validation agent
- agents/reflection.md: Playbook learning agent
- ../templates/implementation-playbook-template.md: Simple playbook structure

Design Philosophy:
The workflow avoids CooperBench's "curse of coordination" by using:
- Sequential implementation (ONE writer, no merge conflicts)
- Parallel verification (safe read-only validation)
- Context reuse (no expectation failures)
- Evidence-based communication (file:line citations)
- Clear role separation (no overlapping responsibilities)
This commit is contained in:
Jonah Schulte 2026-01-28 13:28:37 -05:00
parent 0810646ed6
commit a268b4c1bc
11 changed files with 1189 additions and 330 deletions

View File

@ -1,124 +1,150 @@
# Super-Dev Pipeline - GSDMAD Architecture
# Story-Full-Pipeline v4.0
**Multi-agent pipeline with independent validation and adversarial code review**
Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes.
---
## What's New in v4.0
## Quick Start
### 1. Resume Builder (v3.2+)
**Token Efficiency: 50-70% savings**
```bash
# Run super-dev pipeline for a story
/story-full-pipeline story_key=17-10
- Phase 3 now RESUMES Builder agent with review findings
- Builder already has full codebase context
- More efficient than spawning fresh Fixer agent
### 2. Inspector Code Citations (v4.0)
**Evidence-Based Verification**
- Inspector must map EVERY task to file:line citations
- Example: "Create component" → "src/Component.tsx:45-67"
- No more "trust me, it works" - requires proof
- Returns structured JSON with code evidence per task
### 3. Remove Hospital-Grade Framing (v4.0)
**Focus on Concrete Verification**
- Dropped psychological appeal language
- Kept rigorous verification gates and bash checks
- Replaced with patterns/verification.md + patterns/tdd.md
### 4. Micro Stories Get Security Scan (v4.0)
**Even Simple Stories Need Security**
- No longer skip ALL review for micro stories
- Still get 2 reviewers: Security + Architect
- Lightweight but catches critical vulnerabilities
### 5. Test Quality Agent + Coverage Gate (v4.0)
**Validate Test Completeness**
- New Test Quality Agent validates:
- Edge cases covered (null, empty, invalid)
- Error conditions tested
- Meaningful assertions (not just "doesn't crash")
- No flaky tests (random data, timing)
- Automated Coverage Gate enforces 80% threshold
- Builder must fix test gaps before proceeding
### 6. Playbook Learning System (v4.0)
**Continuous Improvement Through Reflection**
- **Phase 0:** Query playbooks before implementation
- Builder gets relevant patterns/gotchas upfront
- **Phase 6:** Reflection agent extracts learnings
- Auto-generates playbook updates for future agents
- Bootstrap mode: auto-initializes playbooks if missing
## Pipeline Flow
```
Phase 0: Playbook Query (orchestrator)
Phase 1: Builder (initial implementation)
Phase 2: Inspector + Test Quality + N Reviewers (parallel)
Phase 2.5: Coverage Gate (automated)
Phase 3: Resume Builder (fix issues with context)
Phase 4: Inspector re-check (quick verification)
Phase 5: Orchestrator reconciliation (evidence-based)
Phase 6: Playbook Reflection (extract learnings)
```
---
## Complexity Routing
## Architecture
| Complexity | Phase 2 Agents | Total | Reviewers |
|------------|----------------|-------|-----------|
| micro | Inspector + Test Quality + 2 | 4 agents | Security + Architect |
| standard | Inspector + Test Quality + 3 | 5 agents | Security + Logic + Architect |
| complex | Inspector + Test Quality + 4 | 6 agents | Security + Logic + Architect + Quality |
### Multi-Agent Validation
- **4 independent agents** working sequentially
- Builder → Inspector → Reviewer → Fixer
- Each agent has fresh context
- No conflict of interest
## Quality Gates
### Honest Reporting
- Inspector verifies Builder's work (doesn't trust claims)
- Reviewer is adversarial (wants to find issues)
- Main orchestrator does final verification
- Can't fake completion
- **Coverage Threshold:** 80% line coverage required
- **Task Verification:** ALL tasks need file:line evidence
- **Critical Issues:** MUST fix
- **High Issues:** MUST fix
### Wave-Based Execution
- Independent stories run in parallel
- Dependencies respected via waves
- 57% faster than sequential execution
## Token Efficiency
---
- Phase 2 agents spawn in parallel (same cost, faster)
- Phase 3 resumes Builder (50-70% token savings vs fresh agent)
- Phase 4 Inspector only (no full re-review)
## Workflow Phases
## Playbook Configuration
**Phase 1: Builder (Steps 1-4)**
- Load story, analyze gaps
- Write tests (TDD)
- Implement code
- Report what was built (NO VALIDATION)
```yaml
playbooks:
enabled: true
directory: "docs/playbooks/implementation-playbooks"
bootstrap_mode: true # Auto-initialize if missing
max_load: 3
auto_apply_updates: false # Require manual review
discovery:
enabled: true
sources: ["git_history", "docs", "existing_code"]
```
**Phase 2: Inspector (Steps 5-6)**
- Fresh context, no Builder knowledge
- Verify files exist
- Run tests independently
- Run quality checks
- PASS or FAIL verdict
## How It Avoids CooperBench Coordination Failures
**Phase 3: Reviewer (Step 7)**
- Fresh context, adversarial stance
- Find security vulnerabilities
- Find performance problems
- Find logic bugs
- Report issues with severity
Unlike the multi-agent coordination failures documented in CooperBench (Stanford/SAP 2026):
**Phase 4: Fixer (Steps 8-9)**
- Fix CRITICAL issues (all)
- Fix HIGH issues (all)
- Fix MEDIUM issues (time permitting)
- Verify fixes independently
1. **Sequential Implementation** - ONE Builder agent implements entire story (no parallel implementation conflicts)
2. **Parallel Review** - Multiple agents review in parallel (safe read-only operations)
3. **Context Reuse** - SAME agent fixes issues (no expectation failures about partner state)
4. **Evidence-Based** - file:line citations prevent vague communication
5. **Clear Roles** - Builder writes, reviewers validate (no overlapping responsibilities)
**Phase 5: Final Verification**
- Main orchestrator verifies all phases
- Updates story checkboxes
- Creates commit
- Marks story complete
---
## Key Features
**Separation of Concerns:**
- Builder focuses only on implementation
- Inspector focuses only on validation
- Reviewer focuses only on finding issues
- Fixer focuses only on resolving issues
**Independent Validation:**
- Each agent validates the previous agent's work
- No agent validates its own work
- Fresh context prevents confirmation bias
**Quality Enforcement:**
- Multiple quality gates throughout pipeline
- Can't proceed without passing validation
- 95% honesty rate (agents can't fake completion)
---
The workflow uses agents for **verification parallelism**, not **implementation parallelism** - avoiding the "curse of coordination."
## Files
See `workflow.md` for complete architecture details.
**Agent Prompts:**
- `agents/builder.md` - Implementation agent
- `agents/inspector.md` - Validation agent
- `agents/builder.md` - Implementation agent (with playbook awareness)
- `agents/inspector.md` - Validation agent (requires code citations)
- `agents/test-quality.md` - Test quality validation (v4.0)
- `agents/reviewer.md` - Adversarial review agent
- `agents/fixer.md` - Issue resolution agent
- `agents/architect-integration-reviewer.md` - Architecture/integration review
- `agents/fixer.md` - Issue resolution agent (deprecated, uses resume Builder)
- `agents/reflection.md` - Playbook learning agent (v4.0)
**Workflow Config:**
- `workflow.yaml` - Main configuration
- `workflow.md` - Complete documentation
- `workflow.yaml` - Main configuration (v4.0)
- `workflow.md` - Complete step-by-step documentation
**Directory Structure:**
```
story-full-pipeline/
├── README.md (this file)
├── workflow.yaml (configuration)
├── workflow.md (complete documentation)
├── agents/
│ ├── builder.md (implementation agent prompt)
│ ├── inspector.md (validation agent prompt)
│ ├── reviewer.md (review agent prompt)
│ └── fixer.md (fix agent prompt)
└── steps/
└── (step files for each phase)
**Templates:**
- `../templates/implementation-playbook-template.md` - Playbook structure
## Usage
```bash
# Run story-full-pipeline
/story-full-pipeline story_key=17-10
```
---
## Backward Compatibility
**Philosophy:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest.
Falls back to single-agent mode if multi-agent execution fails.

View File

@ -5,7 +5,6 @@
**Trust Level:** HIGH (wants to find integration issues)
<execution_context>
@patterns/hospital-grade.md
@patterns/agent-completion.md
</execution_context>

View File

@ -5,7 +5,6 @@
**Trust Level:** LOW (assume will cut corners)
<execution_context>
@patterns/hospital-grade.md
@patterns/tdd.md
@patterns/agent-completion.md
</execution_context>
@ -17,11 +16,12 @@
You are the **BUILDER** agent. Your job is to implement the story requirements by writing production code and tests.
**DO:**
- **Review playbooks** for gotchas and patterns (if provided)
- Load and understand the story requirements
- Analyze what exists vs what's needed
- Write tests first (TDD approach)
- Implement production code to make tests pass
- Follow project patterns and conventions
- Follow project patterns and playbook guidance
**DO NOT:**
- Validate your own work (Inspector agent will do this)
@ -35,7 +35,8 @@ You are the **BUILDER** agent. Your job is to implement the story requirements b
## Steps to Execute
### Step 1: Initialize
Load story file and cache context:
Load story file and playbooks (if provided):
- **Review playbooks first** (if provided in context) - note gotchas and patterns
- Read story file: `{{story_file}}`
- Parse all sections (Business Context, Acceptance Criteria, Tasks, etc.)
- Determine greenfield vs brownfield
@ -88,54 +89,36 @@ When complete, provide:
---
## Hospital-Grade Standards
## Completion Format (v4.0)
⚕️ **Quality >> Speed**
**Return structured JSON artifact:**
- Take time to do it right
- Don't skip error handling
- Don't leave TODO comments
- Don't use `any` types
---
## When Complete, Return This Format
```markdown
## AGENT COMPLETE
**Agent:** builder
**Story:** {{story_key}}
**Status:** SUCCESS | FAILED
### Files Created
- path/to/new/file1.ts
- path/to/new/file2.ts
### Files Modified
- path/to/existing/file.ts
### Tests Added
- X test files
- Y test cases total
### Implementation Summary
Brief description of what was built and key decisions made.
### Known Gaps
- Any functionality not implemented
- Any edge cases not handled
- NONE if all tasks complete
### Ready For
Inspector validation (next phase)
```json
{
"agent": "builder",
"story_key": "{{story_key}}",
"status": "SUCCESS",
"files_created": ["path/to/file.tsx", "path/to/file.test.tsx"],
"files_modified": ["path/to/existing.tsx"],
"tests_added": {
"total": 12,
"passing": 12
},
"tasks_addressed": [
"Create agreement view component",
"Add status badge",
"Implement occupant selection"
],
"playbooks_reviewed": ["database-patterns.md", "api-security.md"]
}
```
**Why this format?** The orchestrator parses this output to:
- Verify claimed files actually exist
- Track what was built for reconciliation
- Route to next phase appropriately
**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-builder.json`
---
**Remember:** You are the BUILDER. Build it well, but don't validate or review your own work. Other agents will do that with fresh eyes.
**Remember:**
- **Review playbooks first** if provided - they contain gotchas and patterns learned from previous stories
- Build it well with TDD, but don't validate or review your own work
- Other agents will verify with fresh eyes and provide file:line evidence

View File

@ -5,7 +5,6 @@
**Trust Level:** MEDIUM (incentive to minimize work)
<execution_context>
@patterns/hospital-grade.md
@patterns/agent-completion.md
</execution_context>

View File

@ -1,12 +1,11 @@
# Inspector Agent - Validation Phase
# Inspector Agent - Validation Phase with Code Citations
**Role:** Independent verification of Builder's work
**Role:** Independent verification of Builder's work **WITH EVIDENCE**
**Steps:** 5-6 (post-validation, quality-checks)
**Trust Level:** MEDIUM (no conflict of interest)
<execution_context>
@patterns/verification.md
@patterns/hospital-grade.md
@patterns/agent-completion.md
</execution_context>
@ -14,48 +13,54 @@
## Your Mission
You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed.
You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed **and provide file:line evidence for every task**.
**KEY PRINCIPLE: You have NO KNOWLEDGE of what the Builder did. You are starting fresh.**
**CRITICAL REQUIREMENT v4.0: EVERY task must have code citations.**
**DO:**
- Map EACH task to specific code with file:line citations
- Verify files actually exist
- Run tests yourself (don't trust claims)
- Run quality checks (type-check, lint, build)
- Give honest PASS/FAIL verdict
- Provide evidence for EVERY task
**DO NOT:**
- Take the Builder's word for anything
- Skip verification steps
- Skip any task verification
- Give vague "looks good" without citations
- Assume tests pass without running them
- Give PASS verdict if ANY check fails
- Give PASS verdict if ANY check fails or task lacks evidence
---
## Steps to Execute
### Step 5: Post-Validation
### Step 5: Task Verification with Code Citations
**Verify Implementation Against Story:**
**Map EVERY task to specific code locations:**
1. **Check Files Exist:**
```bash
# For each file mentioned in story tasks
ls -la {{file_path}}
# FAIL if file missing or empty
```
1. **Read story file** - understand ALL tasks
2. **Verify File Contents:**
- Open each file
- Check it has actual code (not just TODO/stub)
- Verify it matches story requirements
2. **For EACH task, provide:**
- **file:line** where it's implemented
- **Brief quote** of relevant code
- **Verdict:** IMPLEMENTED or NOT_IMPLEMENTED
3. **Check Tests Exist:**
```bash
# Find test files
find . -name "*.test.ts" -o -name "__tests__"
# FAIL if no tests found for new code
```
**Example Evidence Format:**
```
Task: "Display occupant agreement status"
Evidence: src/features/agreement/StatusBadge.tsx:45-67
Code: "const StatusBadge = ({ status }) => ..."
Verdict: IMPLEMENTED
```
3. **If task NOT implemented:**
- Explain why (file missing, code incomplete, etc.)
- Provide file:line where it should be
**CRITICAL:** If you can't cite file:line, mark as NOT_IMPLEMENTED.
### Step 6: Quality Checks
@ -96,36 +101,49 @@ You are the **INSPECTOR** agent. Your job is to verify that the Builder actually
---
## Output Requirements
## Completion Format (v4.0)
**Provide Evidence-Based Verdict:**
**Return structured JSON with code citations:**
### If PASS:
```markdown
✅ VALIDATION PASSED
Evidence:
- Files verified: [list files checked]
- Type check: PASS (0 errors)
- Linter: PASS (0 warnings)
- Build: PASS
- Tests: 45/45 passing (95% coverage)
- Git: 12 files modified, 3 new files
Ready for code review.
```json
{
"agent": "inspector",
"story_key": "{{story_key}}",
"verdict": "PASS",
"task_verification": [
{
"task": "Create agreement view component",
"implemented": true,
"evidence": [
{
"file": "src/features/agreement/AgreementView.tsx",
"lines": "15-67",
"code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
},
{
"file": "src/features/agreement/AgreementView.test.tsx",
"lines": "8-45",
"code_snippet": "describe('AgreementView', () => {...})"
}
]
},
{
"task": "Add status badge",
"implemented": false,
"evidence": [],
"reason": "No StatusBadge component found in src/features/agreement/"
}
],
"checks": {
"type_check": {"passed": true, "errors": 0},
"lint": {"passed": true, "warnings": 0},
"tests": {"passed": true, "total": 12, "passing": 12},
"build": {"passed": true}
}
}
```
### If FAIL:
```markdown
❌ VALIDATION FAILED
Failures:
1. File missing: app/api/occupant/agreement/route.ts
2. Type check: 3 errors in lib/api/auth.ts
3. Tests: 2 failing (api/occupant tests)
Cannot proceed to code review until these are fixed.
```
**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-inspector.json`
---
@ -133,58 +151,15 @@ Cannot proceed to code review until these are fixed.
**Before giving PASS verdict, confirm:**
- [ ] All story files exist and have content
- [ ] EVERY task has file:line citation or NOT_IMPLEMENTED reason
- [ ] Type check returns 0 errors
- [ ] Linter returns 0 errors/warnings
- [ ] Linter returns 0 warnings
- [ ] Build succeeds
- [ ] Tests run and pass (not skipped)
- [ ] Test coverage >= 90%
- [ ] Git status is clean or has expected changes
- [ ] All implemented tasks have code evidence
**If ANY checkbox is unchecked → FAIL verdict**
---
## Hospital-Grade Standards
⚕️ **Be Thorough**
- Don't skip checks
- Run tests yourself (don't trust claims)
- Verify every file exists
- Give specific evidence
---
## When Complete, Return This Format
```markdown
## AGENT COMPLETE
**Agent:** inspector
**Story:** {{story_key}}
**Status:** PASS | FAIL
### Evidence
- **Type Check:** PASS (0 errors) | FAIL (X errors)
- **Lint:** PASS (0 warnings) | FAIL (X warnings)
- **Build:** PASS | FAIL
- **Tests:** X passing, Y failing, Z% coverage
### Files Verified
- path/to/file1.ts ✓
- path/to/file2.ts ✓
- path/to/missing.ts ✗ (NOT FOUND)
### Failures (if FAIL status)
1. Specific failure with file:line reference
2. Another specific failure
### Ready For
- If PASS: Reviewer (next phase)
- If FAIL: Builder needs to fix before proceeding
```
---
**Remember:** You are the INSPECTOR. Your job is to find the truth, not rubber-stamp the Builder's work. If something is wrong, say so with evidence.
**Remember:** You are the INSPECTOR. Your job is to find the truth with evidence, not rubber-stamp the Builder's work. If something is wrong, say so with file:line citations.

View File

@ -0,0 +1,93 @@
# Reflection Agent - Playbook Learning
You are the **REFLECTION** agent for story {{story_key}}.
## Context
- **Story:** {{story_file}}
- **Builder initial:** {{builder_artifact}}
- **All review findings:** {{all_reviewer_artifacts}}
- **Builder fixes:** {{builder_fixes_artifact}}
- **Test quality issues:** {{test_quality_artifact}}
## Objective
Identify what future agents should know:
1. **What issues were found?** (from reviewers)
2. **What did Builder miss initially?** (gaps, edge cases, security)
3. **What playbook knowledge would have prevented these?**
4. **Which module/feature area does this apply to?**
5. **Should we update existing playbook or create new?**
### Key Questions
- What gotchas should future builders know?
- What code patterns should be standard?
- What test requirements are essential?
- What similar stories exist?
## Success Criteria
- [ ] Analyzed review findings
- [ ] Identified preventable issues
- [ ] Determined which playbook(s) to update
- [ ] Return structured proposal
## Completion Format
Return structured JSON artifact:
```json
{
"agent": "reflection",
"story_key": "{{story_key}}",
"learnings": [
{
"issue": "SQL injection in query builder",
"root_cause": "Builder used string concatenation (didn't know pattern)",
"prevention": "Playbook should document: always use parameterized queries",
"applies_to": "database queries, API endpoints with user input"
},
{
"issue": "Missing edge case tests for empty arrays",
"root_cause": "Test Quality Agent found gap",
"prevention": "Playbook should require: test null/empty/invalid for all inputs",
"applies_to": "all data processing functions"
}
],
"playbook_proposal": {
"action": "update_existing",
"playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
"module": "api/database",
"updates": {
"common_gotchas": [
"Never concatenate user input into SQL - use parameterized queries",
"Test edge cases: null, undefined, [], '', invalid input"
],
"code_patterns": [
"db.query(sql, [param1, param2]) ✓",
"sql + userInput ✗"
],
"test_requirements": [
"Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
"Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
],
"related_stories": ["{{story_key}}"]
}
}
}
```
Save to: `docs/sprint-artifacts/completions/{{story_key}}-reflection.json`
## Playbook Structure
When proposing playbook updates, structure them with these sections:
1. **Common Gotchas** - What mistakes to avoid
2. **Code Patterns** - Standard approaches (with ✓ and ✗ examples)
3. **Test Requirements** - What tests are essential
4. **Related Stories** - Which stories used these patterns
Keep it simple and actionable for future agents.

View File

@ -6,7 +6,6 @@
<execution_context>
@patterns/security-checklist.md
@patterns/hospital-grade.md
@patterns/agent-completion.md
</execution_context>

View File

@ -0,0 +1,73 @@
# Test Quality Agent
You are the **TEST QUALITY** agent for story {{story_key}}.
## Context
- **Story:** {{story_file}}
- **Builder completion:** {{builder_completion_artifact}}
## Objective
Review test files for quality and completeness:
1. Find all test files created/modified by Builder
2. For each test file, verify:
- **Happy path**: Primary functionality tested ✓
- **Edge cases**: null, empty, invalid inputs ✓
- **Error conditions**: Failures handled properly ✓
- **Assertions**: Meaningful checks (not just "doesn't crash")
- **Test names**: Descriptive and clear
- **Deterministic**: No random data, no timing dependencies
3. Check that tests actually validate the feature
**Focus on:** What's missing? What edge cases weren't considered?
## Success Criteria
- [ ] All test files reviewed
- [ ] Edge cases identified (covered or missing)
- [ ] Error conditions verified
- [ ] Assertions are meaningful
- [ ] Tests are deterministic
- [ ] Return quality assessment
## Completion Format
Return structured JSON artifact:
```json
{
"agent": "test_quality",
"story_key": "{{story_key}}",
"verdict": "PASS" | "NEEDS_IMPROVEMENT",
"test_files_reviewed": ["path/to/test.tsx", ...],
"issues": [
{
"severity": "HIGH",
"file": "path/to/test.tsx:45",
"issue": "Missing edge case: empty input array",
"recommendation": "Add test: expect(fn([])).toThrow(...)"
},
{
"severity": "MEDIUM",
"file": "path/to/test.tsx:67",
"issue": "Test uses Math.random() - could be flaky",
"recommendation": "Use fixed test data"
}
],
"coverage_analysis": {
"edge_cases_covered": true,
"error_conditions_tested": true,
"meaningful_assertions": true,
"tests_are_deterministic": true
},
"summary": {
"high_issues": 0,
"medium_issues": 0,
"low_issues": 0
}
}
```
Save to: `docs/sprint-artifacts/completions/{{story_key}}-test-quality.json`

View File

@ -1,74 +1,142 @@
# Super-Dev-Pipeline v3.1 - Token-Efficient Multi-Agent Pipeline
# Story-Full-Pipeline v4.0 - Enhanced Multi-Agent Pipeline
<purpose>
Implement a story using parallel verification agents with Builder context reuse.
Each agent has single responsibility. Builder fixes issues in its own context (50-70% token savings).
Orchestrator handles bookkeeping (story file updates, verification).
Enhanced with playbook learning, code citation evidence, test quality validation, and automated coverage gates.
Builder fixes issues in its own context (50-70% token savings).
</purpose>
<philosophy>
**Token-Efficient Multi-Agent Pipeline**
**Quality Through Discipline, Continuous Learning**
- Builder implements (creative, context preserved)
- Inspector + Reviewers validate in parallel (verification, fresh context)
- Builder fixes issues (creative, reuses context - 50-70% token savings)
- Inspector re-checks (verification, quick check)
- Orchestrator reconciles story file (mechanical)
- Playbook Query: Load relevant patterns before starting
- Builder: Implements with playbook knowledge (context preserved)
- Inspector + Test Quality + Reviewers: Validate in parallel with proof
- Coverage Gate: Automated threshold enforcement
- Builder: Fixes issues in same context (50-70% token savings)
- Inspector: Quick recheck
- Orchestrator: Reconciles mechanically
- Reflection: Updates playbooks for future agents
**Key Innovation:** Resume Builder instead of spawning fresh Fixer.
Builder already knows the codebase - just needs to fix specific issues.
Trust but verify. Fresh context for verification. Reuse context for fixes.
Trust but verify. Fresh context for verification. Evidence-based validation. Self-improving system.
</philosophy>
<config>
name: story-full-pipeline
version: 3.2.0
version: 4.0.0
execution_mode: multi_agent
phases:
phase_0: Playbook Query (orchestrator)
phase_1: Builder (saves agent_id)
phase_2: [Inspector + N Reviewers] in parallel (N = 2/3/4 based on complexity)
phase_2: [Inspector + Test Quality + N Reviewers] in parallel
phase_2.5: Coverage Gate (automated)
phase_3: Resume Builder with all findings (reuses context)
phase_4: Inspector re-check (quick verification)
phase_5: Orchestrator reconciliation
phase_6: Playbook Reflection
reviewer_counts:
micro: 2 reviewers (security, architect/integration) v3.2.0+
standard: 3 reviewers (security, logic/performance, architect/integration) v3.2.0+
complex: 4 reviewers (security, logic, architect/integration, code quality) v3.2.0+
micro: 2 reviewers (security, architect/integration)
standard: 3 reviewers (security, logic/performance, architect/integration)
complex: 4 reviewers (security, logic, architect/integration, code quality)
quality_gates:
coverage_threshold: 80 # % line coverage required
task_verification: "all_with_evidence" # Inspector must cite file:line
critical_issues: "must_fix"
high_issues: "must_fix"
token_efficiency:
- Phase 2 agents spawn in parallel (same cost, faster)
- Phase 3 resumes Builder (50-70% token savings vs fresh Fixer)
- Phase 3 resumes Builder (50-70% token savings vs fresh agent)
- Phase 4 Inspector only (no full re-review)
playbooks:
enabled: true
directory: "docs/playbooks/implementation-playbooks"
max_load: 3
auto_apply_updates: false
</config>
<execution_context>
@patterns/hospital-grade.md
@patterns/verification.md
@patterns/tdd.md
@patterns/agent-completion.md
</execution_context>
<process>
<step name="load_story" priority="first">
Load and validate the story file.
**Load and parse story file**
\`\`\`bash
STORY_FILE="docs/sprint-artifacts/{{story_key}}.md"
[ -f "$STORY_FILE" ] || { echo "ERROR: Story file not found"; exit 1; }
\`\`\`
Use Read tool on the story file. Parse:
- Complexity level (micro/standard/complex)
Use Read tool. Extract:
- Task count
- Acceptance criteria count
- Keywords for risk scoring
Determine which agents to spawn based on complexity routing.
**Determine complexity:**
\`\`\`bash
TASK_COUNT=$(grep -c "^- \[ \]" "$STORY_FILE")
RISK_KEYWORDS=$(grep -ciE "auth|security|payment|encryption|migration|database" "$STORY_FILE")
if [ "$TASK_COUNT" -le 3 ] && [ "$RISK_KEYWORDS" -eq 0 ]; then
COMPLEXITY="micro"
REVIEWER_COUNT=2
elif [ "$TASK_COUNT" -ge 16 ] || [ "$RISK_KEYWORDS" -gt 0 ]; then
COMPLEXITY="complex"
REVIEWER_COUNT=4
else
COMPLEXITY="standard"
REVIEWER_COUNT=3
fi
\`\`\`
Determine agents to spawn: Inspector + Test Quality + $REVIEWER_COUNT Reviewers
</step>
<step name="query_playbooks">
**Phase 0: Playbook Query**
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📚 PHASE 0: PLAYBOOK QUERY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
**Extract story keywords:**
\`\`\`bash
STORY_KEYWORDS=$(grep -E "^## Story Title|^### Feature|^## Business Context" "$STORY_FILE" | sed 's/[#]//g' | tr '\n' ' ')
echo "Story keywords: $STORY_KEYWORDS"
\`\`\`
**Search for relevant playbooks:**
Use Grep tool:
- Pattern: extracted keywords
- Path: \`docs/playbooks/implementation-playbooks/\`
- Output mode: files_with_matches
- Limit: 3 files
**Load matching playbooks:**
For each playbook found:
- Use Read tool
- Extract sections: Common Gotchas, Code Patterns, Test Requirements
If no playbooks exist:
\`\`\`
No playbooks found - this will be the first story to create them
\`\`\`
Store playbook content for Builder.
</step>
<step name="spawn_builder">
**Phase 1: Builder Agent (Steps 1-4)**
**Phase 1: Builder Agent**
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
@ -76,41 +144,359 @@ Determine which agents to spawn based on complexity routing.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
Spawn Builder agent and save agent_id for later resume.
**CRITICAL: Save Builder's agent_id for later resume**
Spawn Builder agent and **SAVE agent_id for resume later**:
\`\`\`
BUILDER_AGENT_ID={{agent_id_from_task_result}}
echo "Builder agent: $BUILDER_AGENT_ID"
BUILDER_TASK = Task({
subagent_type: "general-purpose",
description: "Implement story {{story_key}}",
prompt: \`
You are the BUILDER agent for story {{story_key}}.
<execution_context>
@patterns/tdd.md
@patterns/agent-completion.md
</execution_context>
<context>
Story: [inline story file content]
{{IF playbooks loaded}}
Relevant Playbooks (review before implementing):
[inline playbook content]
Pay special attention to:
- Common Gotchas in these playbooks
- Code Patterns to follow
- Test Requirements to satisfy
{{ENDIF}}
</context>
<objective>
Implement the story requirements:
1. Review story tasks and acceptance criteria
2. **Review playbooks** for gotchas and patterns (if provided)
3. Analyze what exists vs needed (gap analysis)
4. **Write tests FIRST** (TDD - tests before implementation)
5. Implement production code to pass tests
</objective>
<constraints>
- DO NOT validate your own work
- DO NOT review your code
- DO NOT update story checkboxes
- DO NOT commit changes yet
</constraints>
<success_criteria>
- [ ] Reviewed playbooks for guidance
- [ ] Tests written for all requirements
- [ ] Production code implements tests
- [ ] Tests pass
- [ ] Return structured completion artifact
</success_criteria>
<completion_format>
Return structured JSON artifact:
{
"agent": "builder",
"story_key": "{{story_key}}",
"status": "SUCCESS" | "FAILED",
"files_created": ["path/to/file.tsx", ...],
"files_modified": ["path/to/file.tsx", ...],
"tests_added": {
"total": 12,
"passing": 12
},
"tasks_addressed": ["task description from story", ...]
}
Save to: docs/sprint-artifacts/completions/{{story_key}}-builder.json
</completion_format>
\`
})
BUILDER_AGENT_ID = {{extract agent_id from Task result}}
\`\`\`
Wait for completion. Parse structured output. Verify files exist.
**CRITICAL: Store Builder agent ID:**
\`\`\`bash
echo "Builder agent ID: $BUILDER_AGENT_ID"
echo "$BUILDER_AGENT_ID" > /tmp/builder-agent-id.txt
\`\`\`
**Wait for completion. Verify artifact exists:**
\`\`\`bash
BUILDER_COMPLETION="docs/sprint-artifacts/completions/{{story_key}}-builder.json"
[ -f "$BUILDER_COMPLETION" ] || { echo "❌ No builder artifact"; exit 1; }
\`\`\`
**Verify files exist:**
\`\`\`bash
# For each file in files_created and files_modified:
[ -f "$file" ] || echo "❌ MISSING: $file"
\`\`\`
If files missing or status FAILED: halt pipeline.
</step>
<step name="spawn_verification_parallel">
**Phase 2: Parallel Verification (Inspector + Reviewers)**
**Phase 2: Parallel Verification (Inspector + Test Quality + Reviewers)**
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔍 PHASE 2: PARALLEL VERIFICATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Spawning: Inspector + Test Quality + {{REVIEWER_COUNT}} Reviewers
Total agents: {{2 + REVIEWER_COUNT}}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
**CRITICAL: Spawn ALL verification agents in ONE message (parallel execution)**
**CRITICAL: Spawn ALL agents in ONE message (parallel execution)**
Send single message with multiple Task calls:
1. Inspector Agent
2. Test Quality Agent
3. Security Reviewer
4. Logic/Performance Reviewer (if standard/complex)
5. Architect/Integration Reviewer
6. Code Quality Reviewer (if complex)
---
## Inspector Agent Prompt:
Determine reviewer count based on complexity:
\`\`\`
if complexity == "micro": REVIEWER_COUNT = 1
if complexity == "standard": REVIEWER_COUNT = 2
if complexity == "complex": REVIEWER_COUNT = 3
Task({
subagent_type: "general-purpose",
description: "Validate story {{story_key}} implementation",
prompt: \`
You are the INSPECTOR agent for story {{story_key}}.
<execution_context>
@patterns/verification.md
@patterns/agent-completion.md
</execution_context>
<context>
Story: [inline story file content]
</context>
<objective>
Independently verify implementation WITH CODE CITATIONS:
1. Read story file - understand ALL tasks
2. Read each file Builder created/modified
3. **Map EACH task to specific code with file:line citations**
4. Run verification checks:
- Type-check (0 errors required)
- Lint (0 warnings required)
- Tests (all passing required)
- Build (success required)
</objective>
<critical_requirement>
**EVERY task must have evidence.**
For each task, provide:
- file:line where it's implemented
- Brief quote of relevant code
- Verdict: IMPLEMENTED or NOT_IMPLEMENTED
Example:
Task: "Display occupant agreement status"
Evidence: src/features/agreement/StatusBadge.tsx:45-67
Code: "const StatusBadge = ({ status }) => ..."
Verdict: IMPLEMENTED
</critical_requirement>
<constraints>
- You have NO KNOWLEDGE of what Builder did
- Run all checks yourself - don't trust claims
- **Every task needs file:line citation**
- If code doesn't exist: mark NOT IMPLEMENTED with reason
</constraints>
<success_criteria>
- [ ] ALL tasks mapped to code locations
- [ ] Type check: 0 errors
- [ ] Lint: 0 warnings
- [ ] Tests: all passing
- [ ] Build: success
- [ ] Return structured evidence
</success_criteria>
<completion_format>
{
"agent": "inspector",
"story_key": "{{story_key}}",
"verdict": "PASS" | "FAIL",
"task_verification": [
{
"task": "Create agreement view component",
"implemented": true,
"evidence": [
{
"file": "src/features/agreement/AgreementView.tsx",
"lines": "15-67",
"code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
},
{
"file": "src/features/agreement/AgreementView.test.tsx",
"lines": "8-45",
"code_snippet": "describe('AgreementView', () => {...})"
}
]
},
{
"task": "Add status badge",
"implemented": false,
"evidence": [],
"reason": "No StatusBadge component found in src/features/agreement/"
}
],
"checks": {
"type_check": {"passed": true, "errors": 0},
"lint": {"passed": true, "warnings": 0},
"tests": {"passed": true, "total": 12, "passing": 12},
"build": {"passed": true}
}
}
Save to: docs/sprint-artifacts/completions/{{story_key}}-inspector.json
</completion_format>
\`
})
\`\`\`
Spawn Inspector + N Reviewers in single message. Wait for ALL agents to complete. Collect findings.
---
Aggregate all findings from Inspector + Reviewers.
## Test Quality Agent Prompt:
\`\`\`
Task({
subagent_type: "general-purpose",
description: "Review test quality for {{story_key}}",
prompt: \`
You are the TEST QUALITY agent for story {{story_key}}.
<context>
Story: [inline story file content]
Builder completion: [inline builder artifact]
</context>
<objective>
Review test files for quality and completeness:
1. Find all test files created/modified by Builder
2. For each test file, verify:
- **Happy path**: Primary functionality tested ✓
- **Edge cases**: null, empty, invalid inputs ✓
- **Error conditions**: Failures handled properly ✓
- **Assertions**: Meaningful checks (not just "doesn't crash")
- **Test names**: Descriptive and clear
- **Deterministic**: No random data, no timing dependencies
3. Check that tests actually validate the feature
**Focus on:** What's missing? What edge cases weren't considered?
</objective>
<success_criteria>
- [ ] All test files reviewed
- [ ] Edge cases identified (covered or missing)
- [ ] Error conditions verified
- [ ] Assertions are meaningful
- [ ] Tests are deterministic
- [ ] Return quality assessment
</success_criteria>
<completion_format>
{
"agent": "test_quality",
"story_key": "{{story_key}}",
"verdict": "PASS" | "NEEDS_IMPROVEMENT",
"test_files_reviewed": ["path/to/test.tsx", ...],
"issues": [
{
"severity": "HIGH",
"file": "path/to/test.tsx:45",
"issue": "Missing edge case: empty input array",
"recommendation": "Add test: expect(fn([])).toThrow(...)"
},
{
"severity": "MEDIUM",
"file": "path/to/test.tsx:67",
"issue": "Test uses Math.random() - could be flaky",
"recommendation": "Use fixed test data"
}
],
"coverage_analysis": {
"edge_cases_covered": true | false,
"error_conditions_tested": true | false,
"meaningful_assertions": true | false,
"tests_are_deterministic": true | false
},
"summary": {
"high_issues": 1,
"medium_issues": 2,
"low_issues": 0
}
}
Save to: docs/sprint-artifacts/completions/{{story_key}}-test-quality.json
</completion_format>
\`
})
\`\`\`
---
(Continue with Security, Logic, Architect, Quality reviewers as before...)
**Wait for ALL agents to complete.**
Collect completion artifacts:
- \`inspector.json\`
- \`test-quality.json\`
- \`reviewer-security.json\`
- \`reviewer-logic.json\` (if spawned)
- \`reviewer-architect.json\`
- \`reviewer-quality.json\` (if spawned)
Parse all findings and aggregate by severity.
</step>
<step name="coverage_gate">
**Phase 2.5: Coverage Gate (Automated)**
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 PHASE 2.5: COVERAGE GATE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
Run coverage check:
\`\`\`bash
# Run tests with coverage
npm test -- --coverage --silent 2>&1 | tee coverage-output.txt
# Extract coverage percentage (adjust grep pattern for your test framework)
COVERAGE=$(grep -E "All files|Statements" coverage-output.txt | head -1 | grep -oE "[0-9]+\.[0-9]+|[0-9]+" | head -1 || echo "0")
echo "Coverage: ${COVERAGE}%"
echo "Threshold: {{coverage_threshold}}%"
# Compare coverage
if (( $(echo "$COVERAGE < {{coverage_threshold}}" | bc -l) )); then
echo "❌ Coverage ${COVERAGE}% below threshold {{coverage_threshold}}%"
echo "Builder must add more tests before proceeding"
exit 1
fi
echo "✅ Coverage gate passed: ${COVERAGE}%"
\`\`\`
If coverage fails: add to issues list for Builder to fix.
</step>
<step name="resume_builder_with_findings">
@ -156,68 +542,274 @@ If PASS: Proceed to reconciliation.
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔧 PHASE 5: RECONCILIATION (Orchestrator)
📊 PHASE 5: RECONCILIATION (Orchestrator)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
**YOU (orchestrator) do this directly. No agent spawn.**
1. Get what was built (git log, git diff)
2. Read story file
3. Check off completed tasks (Edit tool)
4. Fill Dev Agent Record with pipeline details
5. Verify updates (grep task checkboxes)
6. Update sprint-status.yaml to "done"
**5.1: Load completion artifacts**
\`\`\`bash
BUILDER_FIXES="docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json"
INSPECTOR="docs/sprint-artifacts/completions/{{story_key}}-inspector.json"
\`\`\`
Use Read tool on all artifacts.
**5.2: Read story file**
Use Read tool: \`docs/sprint-artifacts/{{story_key}}.md\`
**5.3: Check off completed tasks using Inspector evidence**
For each task in \`inspector.task_verification\`:
- If \`implemented: true\` and has evidence:
- Use Edit tool: \`"- [ ] {{task}}"\` → \`"- [x] {{task}}"\`
**5.4: Fill Dev Agent Record with evidence**
Use Edit tool:
\`\`\`markdown
### Dev Agent Record
**Implementation Date:** {{timestamp}}
**Agent Model:** Claude Sonnet 4.5 (multi-agent pipeline v4.0)
**Git Commit:** {{git_commit}}
**Pipeline Phases:**
- Phase 0: Playbook Query ({{playbooks_loaded}} loaded)
- Phase 1: Builder (initial implementation)
- Phase 2: Parallel Verification
- Inspector: {{verdict}} with code citations
- Test Quality: {{verdict}}
- {{REVIEWER_COUNT}} Reviewers: {{issues_found}}
- Phase 2.5: Coverage Gate ({{coverage}}%)
- Phase 3: Builder (resumed, fixed {{fixes_count}} issues)
- Phase 4: Inspector re-check ({{verdict}})
**Files Created:** {{count}}
**Files Modified:** {{count}}
**Tests:** {{tests.passing}}/{{tests.total}} passing ({{coverage}}%)
**Issues Fixed:** {{critical}} CRITICAL, {{high}} HIGH, {{medium}} MEDIUM
**Task Evidence:** (Inspector code citations)
{{for each task with evidence}}
- [x] {{task}}
- {{evidence[0].file}}:{{evidence[0].lines}}
{{endfor}}
\`\`\`
**5.5: Verify updates**
\`\`\`bash
CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
[ "$CHECKED" -gt 0 ] || { echo "❌ Zero tasks checked"; exit 1; }
echo "✅ Reconciled: $CHECKED tasks with evidence"
\`\`\`
</step>
<step name="final_verification">
**Final Quality Gate**
Verify:
1. Git commit exists
2. Story tasks checked (count > 0)
3. Dev Agent Record filled
4. Sprint status updated
\`\`\`bash
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "🔍 FINAL VERIFICATION"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
If verification fails: fix using Edit, then re-verify.
# 1. Git commit exists
git log --oneline -3 | grep "{{story_key}}" || { echo "❌ No commit"; exit 1; }
echo "✅ Git commit found"
# 2. Story tasks checked with evidence
CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
[ "$CHECKED" -gt 0 ] || { echo "❌ No tasks checked"; exit 1; }
echo "✅ $CHECKED tasks checked with code citations"
# 3. Dev Agent Record filled
grep -A 5 "### Dev Agent Record" docs/sprint-artifacts/{{story_key}}.md | grep -q "202" || { echo "❌ Record not filled"; exit 1; }
echo "✅ Dev Agent Record filled"
# 4. Coverage met threshold
FINAL_COVERAGE=$(jq -r '.tests.coverage' docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json)
if (( $(echo "$FINAL_COVERAGE < {{coverage_threshold}}" | bc -l) )); then
echo "❌ Coverage ${FINAL_COVERAGE}% still below threshold"
exit 1
fi
echo "✅ Coverage: ${FINAL_COVERAGE}%"
echo ""
echo "✅ STORY COMPLETE"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
\`\`\`
**Update sprint-status.yaml:**
Use Edit tool: \`"{{story_key}}: ready-for-dev"\` → \`"{{story_key}}: done"\`
</step>
<step name="playbook_reflection">
**Phase 6: Playbook Reflection**
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 PHASE 6: PLAYBOOK REFLECTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
Spawn Reflection Agent:
\`\`\`
Task({
subagent_type: "general-purpose",
description: "Extract learnings from {{story_key}}",
prompt: \`
You are the REFLECTION agent for story {{story_key}}.
<context>
Story: [inline story file]
Builder initial: [inline builder.json]
All review findings: [inline all reviewer artifacts]
Builder fixes: [inline builder-fixes.json]
Test quality issues: [inline test-quality.json]
</context>
<objective>
Identify what future agents should know:
1. **What issues were found?** (from reviewers)
2. **What did Builder miss initially?** (gaps, edge cases, security)
3. **What playbook knowledge would have prevented these?**
4. **Which module/feature area does this apply to?**
5. **Should we update existing playbook or create new?**
Questions:
- What gotchas should future builders know?
- What code patterns should be standard?
- What test requirements are essential?
- What similar stories exist?
</objective>
<success_criteria>
- [ ] Analyzed review findings
- [ ] Identified preventable issues
- [ ] Determined which playbook(s) to update
- [ ] Return structured proposal
</success_criteria>
<completion_format>
{
"agent": "reflection",
"story_key": "{{story_key}}",
"learnings": [
{
"issue": "SQL injection in query builder",
"root_cause": "Builder used string concatenation (didn't know pattern)",
"prevention": "Playbook should document: always use parameterized queries",
"applies_to": "database queries, API endpoints with user input"
},
{
"issue": "Missing edge case tests for empty arrays",
"root_cause": "Test Quality Agent found gap",
"prevention": "Playbook should require: test null/empty/invalid for all inputs",
"applies_to": "all data processing functions"
}
],
"playbook_proposal": {
"action": "update_existing" | "create_new",
"playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
"module": "api/database",
"updates": {
"common_gotchas": [
"Never concatenate user input into SQL - use parameterized queries",
"Test edge cases: null, undefined, [], '', invalid input"
],
"code_patterns": [
"db.query(sql, [param1, param2]) ✓",
"sql + userInput ✗"
],
"test_requirements": [
"Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
"Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
],
"related_stories": ["{{story_key}}"]
}
}
}
Save to: docs/sprint-artifacts/completions/{{story_key}}-reflection.json
</completion_format>
\`
})
\`\`\`
**Wait for completion.**
**Review playbook proposal:**
\`\`\`bash
REFLECTION="docs/sprint-artifacts/completions/{{story_key}}-reflection.json"
ACTION=$(jq -r '.playbook_proposal.action' "$REFLECTION")
PLAYBOOK=$(jq -r '.playbook_proposal.playbook' "$REFLECTION")
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "📝 Playbook Update Proposal"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Action: $ACTION"
echo "Playbook: $PLAYBOOK"
echo ""
jq -r '.learnings[] | "- \(.issue)\n Prevention: \(.prevention)"' "$REFLECTION"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
\`\`\`
If \`auto_apply_updates: true\` in config:
- Read playbook (or create from template if new)
- Use Edit tool to add learnings to sections
- Commit playbook update
If \`auto_apply_updates: false\` (default):
- Display proposal for manual review
- User can apply later with \`/update-playbooks {{story_key}}\`
</step>
</process>
<failure_handling>
**Builder fails:** Don't spawn verification. Report failure and halt.
**Inspector fails (Phase 2):** Still run Reviewers in parallel, collect all findings together.
**Inspector fails (Phase 4):** Resume Builder again with new issues (iterative fix loop).
**Builder resume fails:** Report unfixed issues. Manual intervention needed.
**Reconciliation fails:** Fix using Edit tool. Re-verify checkboxes.
**Builder fails (Phase 1):** Don't spawn verification. Report failure and halt.
**Inspector fails (Phase 2):** Still collect other reviewer findings.
**Test Quality fails:** Add issues to Builder fix list.
**Coverage below threshold:** Add to Builder fix list.
**Reviewers find CRITICAL:** Builder MUST fix when resumed.
**Inspector fails (Phase 4):** Resume Builder again (iterative loop, max 3 iterations).
**Builder resume fails:** Report unfixed issues. Manual intervention.
**Reconciliation fails:** Fix with Edit tool, re-verify.
</failure_handling>
<complexity_routing>
| Complexity | Pipeline | Reviewers | Total Phase 2 Agents |
|------------|----------|-----------|---------------------|
| micro | Builder → [Inspector + 2 Reviewers] → Resume Builder → Inspector recheck | 2 (security, architect) | 3 agents |
| standard | Builder → [Inspector + 3 Reviewers] → Resume Builder → Inspector recheck | 3 (security, logic, architect) | 4 agents |
| complex | Builder → [Inspector + 4 Reviewers] → Resume Builder → Inspector recheck | 4 (security, logic, architect, quality) | 5 agents |
| Complexity | Phase 2 Agents | Total | Security |
|------------|----------------|-------|----------|
| micro | Inspector + Test Quality + 2 Reviewers | 4 agents | Security Reviewer + Architect |
| standard | Inspector + Test Quality + 3 Reviewers | 5 agents | Security + Logic + Architect |
| complex | Inspector + Test Quality + 4 Reviewers | 6 agents | Security + Logic + Architect + Quality |
**Key Improvements (v3.2.0):**
- All verification agents spawn in parallel (single message, faster execution)
- Builder resume in Phase 3 saves 50-70% tokens vs spawning fresh Fixer
- **NEW:** Architect/Integration Reviewer catches runtime issues (404s, pattern violations, missing migrations)
**Reviewer Specializations:**
- **Security:** Auth, injection, secrets, cross-tenant access
- **Logic/Performance:** Bugs, edge cases, N+1 queries, race conditions
- **Architect/Integration:** Routes work, patterns match, migrations applied, dependencies installed (v3.2.0+)
- **Code Quality:** Maintainability, naming, duplication (complex only)
**All verification agents spawn in parallel (single message)**
</complexity_routing>
<success_criteria>
- [ ] Builder spawned and agent_id saved
- [ ] All verification agents completed in parallel
- [ ] Builder resumed with consolidated findings
- [ ] Inspector recheck passed
- [ ] Git commit exists for story
- [ ] Story file has checked tasks (count > 0)
- [ ] Dev Agent Record filled with all phases
- [ ] Sprint status updated to "done"
- [ ] Phase 0: Playbooks loaded (if available)
- [ ] Phase 1: Builder spawned, agent_id saved
- [ ] Phase 2: All verification agents completed in parallel
- [ ] Phase 2.5: Coverage gate passed
- [ ] Phase 3: Builder resumed with consolidated findings
- [ ] Phase 4: Inspector recheck passed
- [ ] Phase 5: Orchestrator reconciled with Inspector evidence
- [ ] Phase 6: Playbook reflection completed
- [ ] Git commit exists
- [ ] Story tasks checked with code citations
- [ ] Dev Agent Record filled
- [ ] Coverage ≥ {{coverage_threshold}}%
- [ ] Sprint status: done
</success_criteria>
<improvements_v4>
1. ✅ Resume Builder for fixes (v3.2+) - 50-70% token savings
2. ✅ Inspector provides code citations (v4.0) - file:line evidence for every task
3. ✅ Removed "hospital-grade" framing (v4.0) - kept disciplined gates
4. ✅ Micro stories get 2 reviewers + security scan (v3.2+) - not zero
5. ✅ Test Quality Agent (v4.0) + Coverage Gate (v4.0) - validates test quality and enforces threshold
6. ✅ Playbook query (v4.0) before Builder + reflection (v4.0) after - continuous learning
</improvements_v4>

View File

@ -1,7 +1,7 @@
name: story-full-pipeline
description: "Multi-agent pipeline with wave-based execution, independent validation, and adversarial code review (GSDMAD)"
author: "BMAD Method + GSD"
version: "3.2.0" # Added architect-integration-reviewer for runtime verification
description: "Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes"
author: "BMAD Method"
version: "4.0.0" # Added playbook learning, test quality, coverage gates, Inspector code citations
# Execution mode
execution_mode: "multi_agent" # multi_agent | single_agent (fallback)
@ -37,13 +37,23 @@ agents:
timeout: 3600 # 1 hour
inspector:
description: "Validation agent - independent verification"
description: "Validation agent - independent verification with code citations"
steps: [5, 6]
subagent_type: "general-purpose"
prompt_file: "{agents_path}/inspector.md"
fresh_context: true # No knowledge of builder agent
trust_level: "medium" # No conflict of interest
timeout: 1800 # 30 minutes
require_code_citations: true # v4.0: Must provide file:line evidence for all tasks
test_quality:
description: "Test quality validation - verifies test coverage and quality"
steps: [5.5]
subagent_type: "general-purpose"
prompt_file: "{agents_path}/test-quality.md"
fresh_context: true
trust_level: "medium"
timeout: 1200 # 20 minutes
reviewer:
description: "Adversarial code review - finds problems"
@ -73,15 +83,40 @@ agents:
trust_level: "medium" # Incentive to minimize work
timeout: 2400 # 40 minutes
reflection:
description: "Playbook learning - extracts patterns for future agents"
steps: [10]
subagent_type: "general-purpose"
prompt_file: "{agents_path}/reflection.md"
timeout: 900 # 15 minutes
# Reconciliation: orchestrator does this directly (see workflow.md Phase 5)
# Playbook configuration (v4.0)
playbooks:
enabled: true # Set to false in project config to disable
directory: "docs/playbooks/implementation-playbooks"
bootstrap_mode: true # Auto-initialize if missing
max_load: 3
auto_apply_updates: false # Require manual review of playbook updates
discovery:
enabled: true # Scan git/docs to populate initial playbooks
sources: ["git_history", "docs", "existing_code"]
# Quality gates (v4.0)
quality_gates:
coverage_threshold: 80 # % line coverage required
task_verification: "all_with_evidence" # Inspector must provide file:line citations
critical_issues: "must_fix"
high_issues: "must_fix"
# Complexity level (determines which steps to execute)
complexity_level: "standard" # micro | standard | complex
# Complexity routing
complexity_routing:
micro:
skip_agents: ["reviewer"] # Skip code review for micro stories
skip_agents: [] # Full pipeline (v4.0: micro gets security scan)
description: "Lightweight path for low-risk stories"
examples: ["UI tweaks", "text changes", "simple CRUD"]

View File

@ -0,0 +1,85 @@
# {{Module/Feature Area}} - Implementation Playbook
> **Purpose:** Guide future agents implementing features in {{module_name}}
> **Created:** {{date}}
> **Last Updated:** {{date}}
## Common Gotchas
**What mistakes to avoid:**
- Add specific gotchas here as they're discovered
- Example: "Never concatenate user input into SQL queries"
- Example: "Always validate file paths before operations"
## Code Patterns
**Standard approaches that work:**
### Pattern: {{Pattern Name}}
✓ **Good:**
```
// Example of correct pattern
db.query(sql, [param1, param2])
```
✗ **Bad:**
```
// Example of incorrect pattern
sql + userInput
```
### Pattern: {{Another Pattern}}
✓ **Good:**
```
// Another example
if (!data) return null;
```
✗ **Bad:**
```
// Don't do this
data.map(...) // crashes if data is null
```
## Test Requirements
**Essential tests for this module:**
- **Happy path:** Verify primary functionality
- **Edge cases:** Test null, undefined, empty arrays, invalid inputs
- **Error conditions:** Verify errors are handled properly
- **Security:** Test for injection attacks, auth bypasses, etc.
### Example Test Pattern
```typescript
describe('FeatureName', () => {
it('handles happy path', () => {
expect(fn(validInput)).toEqual(expected)
})
it('handles edge cases', () => {
expect(fn(null)).toThrow()
expect(fn([])).toEqual([])
})
it('validates security', () => {
expect(fn("' OR 1=1--")).toThrow()
})
})
```
## Related Stories
Stories that used these patterns:
- {{story_key}} - {{brief description}}
## Notes
- Keep this simple and actionable
- Add new learnings as they emerge
- Focus on preventable mistakes