feat: upgrade story-full-pipeline to v4.0 with 6 major enhancements

Upgrade from v3.2.0 to v4.0.0 with improvements inspired by CooperBench research
(Stanford/SAP 2026) on agent coordination failures.

Enhancement 1: Resume Builder (v3.2+)
- Phase 3 RESUMES Builder agent with review findings
- Builder already has full codebase context (50-70% token savings)
- More efficient than spawning fresh Fixer agent

Enhancement 2: Inspector Code Citations (v4.0)
- Inspector must map EVERY task to file:line citations
- Example: "Create component" → "src/Component.tsx:45-67"
- No more "trust me, it works" - requires proof
- Returns structured JSON with code evidence per task
- Prevents vague communication (CooperBench finding)

Enhancement 3: Remove Hospital-Grade Framing (v4.0)
- Dropped psychological appeal language
- Kept rigorous verification gates and bash checks
- Focus on concrete, measurable verification
- Replaced with patterns/verification.md + patterns/tdd.md

Enhancement 4: Micro Stories Get Security Scan (v4.0)
- No longer skip ALL review for micro stories
- Micro now gets 2 reviewers: Security + Architect
- Lightweight but still catches critical vulnerabilities

Enhancement 5: Test Quality Agent + Coverage Gate (v4.0)
- New Test Quality Agent validates:
  - Edge cases covered (null, empty, invalid)
  - Error conditions tested
  - Meaningful assertions (not just "doesn't crash")
  - No flaky tests (random data, timing)
- Automated Coverage Gate enforces 80% threshold
- Builder must fix test gaps before proceeding

Enhancement 6: Playbook Learning System (v4.0)
- Phase 0: Query playbooks before implementation
- Builder gets relevant patterns/gotchas upfront
- Phase 6: Reflection agent extracts learnings
- Auto-generates playbook updates for future agents
- Bootstrap mode: auto-initializes playbooks if missing
- Continuous improvement through reflection

Pipeline: Phase 0 (Playbooks) → Phase 1 (Builder) → Phase 2 (Inspector +
Test Quality + Reviewers parallel) → Phase 2.5 (Coverage Gate) → Phase 3
(Resume Builder) → Phase 4 (Inspector recheck) → Phase 5 (Reconciliation) →
Phase 6 (Reflection)

Files Modified:
- workflow.yaml: v4.0 config with playbooks + quality_gates
- workflow.md: Complete v4.0 documentation with all phases
- agents/builder.md: Playbook awareness + structured JSON
- agents/inspector.md: Code citation requirements + evidence format
- agents/reviewer.md: Remove hospital-grade reference
- agents/architect-integration-reviewer.md: Remove hospital-grade reference
- agents/fixer.md: Remove hospital-grade reference
- README.md: v4.0 documentation + CooperBench analysis

Files Created:
- agents/test-quality.md: Test quality validation agent
- agents/reflection.md: Playbook learning agent
- ../templates/implementation-playbook-template.md: Simple playbook structure

Design Philosophy:
The workflow avoids CooperBench's "curse of coordination" by using:
- Sequential implementation (ONE writer, no merge conflicts)
- Parallel verification (safe read-only validation)
- Context reuse (no expectation failures)
- Evidence-based communication (file:line citations)
- Clear role separation (no overlapping responsibilities)
This commit is contained in:
Jonah Schulte 2026-01-28 13:28:37 -05:00
parent 0810646ed6
commit a268b4c1bc
11 changed files with 1189 additions and 330 deletions

View File

@ -1,124 +1,150 @@
# Super-Dev Pipeline - GSDMAD Architecture # Story-Full-Pipeline v4.0
**Multi-agent pipeline with independent validation and adversarial code review** Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes.
--- ## What's New in v4.0
## Quick Start ### 1. Resume Builder (v3.2+)
**Token Efficiency: 50-70% savings**
```bash - Phase 3 now RESUMES Builder agent with review findings
# Run super-dev pipeline for a story - Builder already has full codebase context
/story-full-pipeline story_key=17-10 - More efficient than spawning fresh Fixer agent
### 2. Inspector Code Citations (v4.0)
**Evidence-Based Verification**
- Inspector must map EVERY task to file:line citations
- Example: "Create component" → "src/Component.tsx:45-67"
- No more "trust me, it works" - requires proof
- Returns structured JSON with code evidence per task
### 3. Remove Hospital-Grade Framing (v4.0)
**Focus on Concrete Verification**
- Dropped psychological appeal language
- Kept rigorous verification gates and bash checks
- Replaced with patterns/verification.md + patterns/tdd.md
### 4. Micro Stories Get Security Scan (v4.0)
**Even Simple Stories Need Security**
- No longer skip ALL review for micro stories
- Still get 2 reviewers: Security + Architect
- Lightweight but catches critical vulnerabilities
### 5. Test Quality Agent + Coverage Gate (v4.0)
**Validate Test Completeness**
- New Test Quality Agent validates:
- Edge cases covered (null, empty, invalid)
- Error conditions tested
- Meaningful assertions (not just "doesn't crash")
- No flaky tests (random data, timing)
- Automated Coverage Gate enforces 80% threshold
- Builder must fix test gaps before proceeding
### 6. Playbook Learning System (v4.0)
**Continuous Improvement Through Reflection**
- **Phase 0:** Query playbooks before implementation
- Builder gets relevant patterns/gotchas upfront
- **Phase 6:** Reflection agent extracts learnings
- Auto-generates playbook updates for future agents
- Bootstrap mode: auto-initializes playbooks if missing
## Pipeline Flow
```
Phase 0: Playbook Query (orchestrator)
Phase 1: Builder (initial implementation)
Phase 2: Inspector + Test Quality + N Reviewers (parallel)
Phase 2.5: Coverage Gate (automated)
Phase 3: Resume Builder (fix issues with context)
Phase 4: Inspector re-check (quick verification)
Phase 5: Orchestrator reconciliation (evidence-based)
Phase 6: Playbook Reflection (extract learnings)
``` ```
--- ## Complexity Routing
## Architecture | Complexity | Phase 2 Agents | Total | Reviewers |
|------------|----------------|-------|-----------|
| micro | Inspector + Test Quality + 2 | 4 agents | Security + Architect |
| standard | Inspector + Test Quality + 3 | 5 agents | Security + Logic + Architect |
| complex | Inspector + Test Quality + 4 | 6 agents | Security + Logic + Architect + Quality |
### Multi-Agent Validation ## Quality Gates
- **4 independent agents** working sequentially
- Builder → Inspector → Reviewer → Fixer
- Each agent has fresh context
- No conflict of interest
### Honest Reporting - **Coverage Threshold:** 80% line coverage required
- Inspector verifies Builder's work (doesn't trust claims) - **Task Verification:** ALL tasks need file:line evidence
- Reviewer is adversarial (wants to find issues) - **Critical Issues:** MUST fix
- Main orchestrator does final verification - **High Issues:** MUST fix
- Can't fake completion
### Wave-Based Execution ## Token Efficiency
- Independent stories run in parallel
- Dependencies respected via waves
- 57% faster than sequential execution
--- - Phase 2 agents spawn in parallel (same cost, faster)
- Phase 3 resumes Builder (50-70% token savings vs fresh agent)
- Phase 4 Inspector only (no full re-review)
## Workflow Phases ## Playbook Configuration
**Phase 1: Builder (Steps 1-4)** ```yaml
- Load story, analyze gaps playbooks:
- Write tests (TDD) enabled: true
- Implement code directory: "docs/playbooks/implementation-playbooks"
- Report what was built (NO VALIDATION) bootstrap_mode: true # Auto-initialize if missing
max_load: 3
auto_apply_updates: false # Require manual review
discovery:
enabled: true
sources: ["git_history", "docs", "existing_code"]
```
**Phase 2: Inspector (Steps 5-6)** ## How It Avoids CooperBench Coordination Failures
- Fresh context, no Builder knowledge
- Verify files exist
- Run tests independently
- Run quality checks
- PASS or FAIL verdict
**Phase 3: Reviewer (Step 7)** Unlike the multi-agent coordination failures documented in CooperBench (Stanford/SAP 2026):
- Fresh context, adversarial stance
- Find security vulnerabilities
- Find performance problems
- Find logic bugs
- Report issues with severity
**Phase 4: Fixer (Steps 8-9)** 1. **Sequential Implementation** - ONE Builder agent implements entire story (no parallel implementation conflicts)
- Fix CRITICAL issues (all) 2. **Parallel Review** - Multiple agents review in parallel (safe read-only operations)
- Fix HIGH issues (all) 3. **Context Reuse** - SAME agent fixes issues (no expectation failures about partner state)
- Fix MEDIUM issues (time permitting) 4. **Evidence-Based** - file:line citations prevent vague communication
- Verify fixes independently 5. **Clear Roles** - Builder writes, reviewers validate (no overlapping responsibilities)
**Phase 5: Final Verification** The workflow uses agents for **verification parallelism**, not **implementation parallelism** - avoiding the "curse of coordination."
- Main orchestrator verifies all phases
- Updates story checkboxes
- Creates commit
- Marks story complete
---
## Key Features
**Separation of Concerns:**
- Builder focuses only on implementation
- Inspector focuses only on validation
- Reviewer focuses only on finding issues
- Fixer focuses only on resolving issues
**Independent Validation:**
- Each agent validates the previous agent's work
- No agent validates its own work
- Fresh context prevents confirmation bias
**Quality Enforcement:**
- Multiple quality gates throughout pipeline
- Can't proceed without passing validation
- 95% honesty rate (agents can't fake completion)
---
## Files ## Files
See `workflow.md` for complete architecture details.
**Agent Prompts:** **Agent Prompts:**
- `agents/builder.md` - Implementation agent - `agents/builder.md` - Implementation agent (with playbook awareness)
- `agents/inspector.md` - Validation agent - `agents/inspector.md` - Validation agent (requires code citations)
- `agents/test-quality.md` - Test quality validation (v4.0)
- `agents/reviewer.md` - Adversarial review agent - `agents/reviewer.md` - Adversarial review agent
- `agents/fixer.md` - Issue resolution agent - `agents/architect-integration-reviewer.md` - Architecture/integration review
- `agents/fixer.md` - Issue resolution agent (deprecated, uses resume Builder)
- `agents/reflection.md` - Playbook learning agent (v4.0)
**Workflow Config:** **Workflow Config:**
- `workflow.yaml` - Main configuration - `workflow.yaml` - Main configuration (v4.0)
- `workflow.md` - Complete documentation - `workflow.md` - Complete step-by-step documentation
**Directory Structure:** **Templates:**
``` - `../templates/implementation-playbook-template.md` - Playbook structure
story-full-pipeline/
├── README.md (this file) ## Usage
├── workflow.yaml (configuration)
├── workflow.md (complete documentation) ```bash
├── agents/ # Run story-full-pipeline
│ ├── builder.md (implementation agent prompt) /story-full-pipeline story_key=17-10
│ ├── inspector.md (validation agent prompt)
│ ├── reviewer.md (review agent prompt)
│ └── fixer.md (fix agent prompt)
└── steps/
└── (step files for each phase)
``` ```
--- ## Backward Compatibility
**Philosophy:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest. Falls back to single-agent mode if multi-agent execution fails.

View File

@ -5,7 +5,6 @@
**Trust Level:** HIGH (wants to find integration issues) **Trust Level:** HIGH (wants to find integration issues)
<execution_context> <execution_context>
@patterns/hospital-grade.md
@patterns/agent-completion.md @patterns/agent-completion.md
</execution_context> </execution_context>

View File

@ -5,7 +5,6 @@
**Trust Level:** LOW (assume will cut corners) **Trust Level:** LOW (assume will cut corners)
<execution_context> <execution_context>
@patterns/hospital-grade.md
@patterns/tdd.md @patterns/tdd.md
@patterns/agent-completion.md @patterns/agent-completion.md
</execution_context> </execution_context>
@ -17,11 +16,12 @@
You are the **BUILDER** agent. Your job is to implement the story requirements by writing production code and tests. You are the **BUILDER** agent. Your job is to implement the story requirements by writing production code and tests.
**DO:** **DO:**
- **Review playbooks** for gotchas and patterns (if provided)
- Load and understand the story requirements - Load and understand the story requirements
- Analyze what exists vs what's needed - Analyze what exists vs what's needed
- Write tests first (TDD approach) - Write tests first (TDD approach)
- Implement production code to make tests pass - Implement production code to make tests pass
- Follow project patterns and conventions - Follow project patterns and playbook guidance
**DO NOT:** **DO NOT:**
- Validate your own work (Inspector agent will do this) - Validate your own work (Inspector agent will do this)
@ -35,7 +35,8 @@ You are the **BUILDER** agent. Your job is to implement the story requirements b
## Steps to Execute ## Steps to Execute
### Step 1: Initialize ### Step 1: Initialize
Load story file and cache context: Load story file and playbooks (if provided):
- **Review playbooks first** (if provided in context) - note gotchas and patterns
- Read story file: `{{story_file}}` - Read story file: `{{story_file}}`
- Parse all sections (Business Context, Acceptance Criteria, Tasks, etc.) - Parse all sections (Business Context, Acceptance Criteria, Tasks, etc.)
- Determine greenfield vs brownfield - Determine greenfield vs brownfield
@ -88,54 +89,36 @@ When complete, provide:
--- ---
## Hospital-Grade Standards ## Completion Format (v4.0)
⚕️ **Quality >> Speed** **Return structured JSON artifact:**
- Take time to do it right ```json
- Don't skip error handling {
- Don't leave TODO comments "agent": "builder",
- Don't use `any` types "story_key": "{{story_key}}",
"status": "SUCCESS",
--- "files_created": ["path/to/file.tsx", "path/to/file.test.tsx"],
"files_modified": ["path/to/existing.tsx"],
## When Complete, Return This Format "tests_added": {
"total": 12,
```markdown "passing": 12
## AGENT COMPLETE },
"tasks_addressed": [
**Agent:** builder "Create agreement view component",
**Story:** {{story_key}} "Add status badge",
**Status:** SUCCESS | FAILED "Implement occupant selection"
],
### Files Created "playbooks_reviewed": ["database-patterns.md", "api-security.md"]
- path/to/new/file1.ts }
- path/to/new/file2.ts
### Files Modified
- path/to/existing/file.ts
### Tests Added
- X test files
- Y test cases total
### Implementation Summary
Brief description of what was built and key decisions made.
### Known Gaps
- Any functionality not implemented
- Any edge cases not handled
- NONE if all tasks complete
### Ready For
Inspector validation (next phase)
``` ```
**Why this format?** The orchestrator parses this output to: **Save to:** `docs/sprint-artifacts/completions/{{story_key}}-builder.json`
- Verify claimed files actually exist
- Track what was built for reconciliation
- Route to next phase appropriately
--- ---
**Remember:** You are the BUILDER. Build it well, but don't validate or review your own work. Other agents will do that with fresh eyes. **Remember:**
- **Review playbooks first** if provided - they contain gotchas and patterns learned from previous stories
- Build it well with TDD, but don't validate or review your own work
- Other agents will verify with fresh eyes and provide file:line evidence

View File

@ -5,7 +5,6 @@
**Trust Level:** MEDIUM (incentive to minimize work) **Trust Level:** MEDIUM (incentive to minimize work)
<execution_context> <execution_context>
@patterns/hospital-grade.md
@patterns/agent-completion.md @patterns/agent-completion.md
</execution_context> </execution_context>

View File

@ -1,12 +1,11 @@
# Inspector Agent - Validation Phase # Inspector Agent - Validation Phase with Code Citations
**Role:** Independent verification of Builder's work **Role:** Independent verification of Builder's work **WITH EVIDENCE**
**Steps:** 5-6 (post-validation, quality-checks) **Steps:** 5-6 (post-validation, quality-checks)
**Trust Level:** MEDIUM (no conflict of interest) **Trust Level:** MEDIUM (no conflict of interest)
<execution_context> <execution_context>
@patterns/verification.md @patterns/verification.md
@patterns/hospital-grade.md
@patterns/agent-completion.md @patterns/agent-completion.md
</execution_context> </execution_context>
@ -14,48 +13,54 @@
## Your Mission ## Your Mission
You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed. You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed **and provide file:line evidence for every task**.
**KEY PRINCIPLE: You have NO KNOWLEDGE of what the Builder did. You are starting fresh.** **KEY PRINCIPLE: You have NO KNOWLEDGE of what the Builder did. You are starting fresh.**
**CRITICAL REQUIREMENT v4.0: EVERY task must have code citations.**
**DO:** **DO:**
- Map EACH task to specific code with file:line citations
- Verify files actually exist - Verify files actually exist
- Run tests yourself (don't trust claims) - Run tests yourself (don't trust claims)
- Run quality checks (type-check, lint, build) - Run quality checks (type-check, lint, build)
- Give honest PASS/FAIL verdict - Provide evidence for EVERY task
**DO NOT:** **DO NOT:**
- Take the Builder's word for anything - Skip any task verification
- Skip verification steps - Give vague "looks good" without citations
- Assume tests pass without running them - Assume tests pass without running them
- Give PASS verdict if ANY check fails - Give PASS verdict if ANY check fails or task lacks evidence
--- ---
## Steps to Execute ## Steps to Execute
### Step 5: Post-Validation ### Step 5: Task Verification with Code Citations
**Verify Implementation Against Story:** **Map EVERY task to specific code locations:**
1. **Check Files Exist:** 1. **Read story file** - understand ALL tasks
```bash
# For each file mentioned in story tasks
ls -la {{file_path}}
# FAIL if file missing or empty
```
2. **Verify File Contents:** 2. **For EACH task, provide:**
- Open each file - **file:line** where it's implemented
- Check it has actual code (not just TODO/stub) - **Brief quote** of relevant code
- Verify it matches story requirements - **Verdict:** IMPLEMENTED or NOT_IMPLEMENTED
3. **Check Tests Exist:** **Example Evidence Format:**
```bash
# Find test files ```
find . -name "*.test.ts" -o -name "__tests__" Task: "Display occupant agreement status"
# FAIL if no tests found for new code Evidence: src/features/agreement/StatusBadge.tsx:45-67
``` Code: "const StatusBadge = ({ status }) => ..."
Verdict: IMPLEMENTED
```
3. **If task NOT implemented:**
- Explain why (file missing, code incomplete, etc.)
- Provide file:line where it should be
**CRITICAL:** If you can't cite file:line, mark as NOT_IMPLEMENTED.
### Step 6: Quality Checks ### Step 6: Quality Checks
@ -96,36 +101,49 @@ You are the **INSPECTOR** agent. Your job is to verify that the Builder actually
--- ---
## Output Requirements ## Completion Format (v4.0)
**Provide Evidence-Based Verdict:** **Return structured JSON with code citations:**
### If PASS: ```json
```markdown {
✅ VALIDATION PASSED "agent": "inspector",
"story_key": "{{story_key}}",
Evidence: "verdict": "PASS",
- Files verified: [list files checked] "task_verification": [
- Type check: PASS (0 errors) {
- Linter: PASS (0 warnings) "task": "Create agreement view component",
- Build: PASS "implemented": true,
- Tests: 45/45 passing (95% coverage) "evidence": [
- Git: 12 files modified, 3 new files {
"file": "src/features/agreement/AgreementView.tsx",
Ready for code review. "lines": "15-67",
"code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
},
{
"file": "src/features/agreement/AgreementView.test.tsx",
"lines": "8-45",
"code_snippet": "describe('AgreementView', () => {...})"
}
]
},
{
"task": "Add status badge",
"implemented": false,
"evidence": [],
"reason": "No StatusBadge component found in src/features/agreement/"
}
],
"checks": {
"type_check": {"passed": true, "errors": 0},
"lint": {"passed": true, "warnings": 0},
"tests": {"passed": true, "total": 12, "passing": 12},
"build": {"passed": true}
}
}
``` ```
### If FAIL: **Save to:** `docs/sprint-artifacts/completions/{{story_key}}-inspector.json`
```markdown
❌ VALIDATION FAILED
Failures:
1. File missing: app/api/occupant/agreement/route.ts
2. Type check: 3 errors in lib/api/auth.ts
3. Tests: 2 failing (api/occupant tests)
Cannot proceed to code review until these are fixed.
```
--- ---
@ -133,58 +151,15 @@ Cannot proceed to code review until these are fixed.
**Before giving PASS verdict, confirm:** **Before giving PASS verdict, confirm:**
- [ ] All story files exist and have content - [ ] EVERY task has file:line citation or NOT_IMPLEMENTED reason
- [ ] Type check returns 0 errors - [ ] Type check returns 0 errors
- [ ] Linter returns 0 errors/warnings - [ ] Linter returns 0 warnings
- [ ] Build succeeds - [ ] Build succeeds
- [ ] Tests run and pass (not skipped) - [ ] Tests run and pass (not skipped)
- [ ] Test coverage >= 90% - [ ] All implemented tasks have code evidence
- [ ] Git status is clean or has expected changes
**If ANY checkbox is unchecked → FAIL verdict** **If ANY checkbox is unchecked → FAIL verdict**
--- ---
## Hospital-Grade Standards **Remember:** You are the INSPECTOR. Your job is to find the truth with evidence, not rubber-stamp the Builder's work. If something is wrong, say so with file:line citations.
⚕️ **Be Thorough**
- Don't skip checks
- Run tests yourself (don't trust claims)
- Verify every file exists
- Give specific evidence
---
## When Complete, Return This Format
```markdown
## AGENT COMPLETE
**Agent:** inspector
**Story:** {{story_key}}
**Status:** PASS | FAIL
### Evidence
- **Type Check:** PASS (0 errors) | FAIL (X errors)
- **Lint:** PASS (0 warnings) | FAIL (X warnings)
- **Build:** PASS | FAIL
- **Tests:** X passing, Y failing, Z% coverage
### Files Verified
- path/to/file1.ts ✓
- path/to/file2.ts ✓
- path/to/missing.ts ✗ (NOT FOUND)
### Failures (if FAIL status)
1. Specific failure with file:line reference
2. Another specific failure
### Ready For
- If PASS: Reviewer (next phase)
- If FAIL: Builder needs to fix before proceeding
```
---
**Remember:** You are the INSPECTOR. Your job is to find the truth, not rubber-stamp the Builder's work. If something is wrong, say so with evidence.

View File

@ -0,0 +1,93 @@
# Reflection Agent - Playbook Learning
You are the **REFLECTION** agent for story {{story_key}}.
## Context
- **Story:** {{story_file}}
- **Builder initial:** {{builder_artifact}}
- **All review findings:** {{all_reviewer_artifacts}}
- **Builder fixes:** {{builder_fixes_artifact}}
- **Test quality issues:** {{test_quality_artifact}}
## Objective
Identify what future agents should know:
1. **What issues were found?** (from reviewers)
2. **What did Builder miss initially?** (gaps, edge cases, security)
3. **What playbook knowledge would have prevented these?**
4. **Which module/feature area does this apply to?**
5. **Should we update existing playbook or create new?**
### Key Questions
- What gotchas should future builders know?
- What code patterns should be standard?
- What test requirements are essential?
- What similar stories exist?
## Success Criteria
- [ ] Analyzed review findings
- [ ] Identified preventable issues
- [ ] Determined which playbook(s) to update
- [ ] Return structured proposal
## Completion Format
Return structured JSON artifact:
```json
{
"agent": "reflection",
"story_key": "{{story_key}}",
"learnings": [
{
"issue": "SQL injection in query builder",
"root_cause": "Builder used string concatenation (didn't know pattern)",
"prevention": "Playbook should document: always use parameterized queries",
"applies_to": "database queries, API endpoints with user input"
},
{
"issue": "Missing edge case tests for empty arrays",
"root_cause": "Test Quality Agent found gap",
"prevention": "Playbook should require: test null/empty/invalid for all inputs",
"applies_to": "all data processing functions"
}
],
"playbook_proposal": {
"action": "update_existing",
"playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
"module": "api/database",
"updates": {
"common_gotchas": [
"Never concatenate user input into SQL - use parameterized queries",
"Test edge cases: null, undefined, [], '', invalid input"
],
"code_patterns": [
"db.query(sql, [param1, param2]) ✓",
"sql + userInput ✗"
],
"test_requirements": [
"Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
"Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
],
"related_stories": ["{{story_key}}"]
}
}
}
```
Save to: `docs/sprint-artifacts/completions/{{story_key}}-reflection.json`
## Playbook Structure
When proposing playbook updates, structure them with these sections:
1. **Common Gotchas** - What mistakes to avoid
2. **Code Patterns** - Standard approaches (with ✓ and ✗ examples)
3. **Test Requirements** - What tests are essential
4. **Related Stories** - Which stories used these patterns
Keep it simple and actionable for future agents.

View File

@ -6,7 +6,6 @@
<execution_context> <execution_context>
@patterns/security-checklist.md @patterns/security-checklist.md
@patterns/hospital-grade.md
@patterns/agent-completion.md @patterns/agent-completion.md
</execution_context> </execution_context>

View File

@ -0,0 +1,73 @@
# Test Quality Agent
You are the **TEST QUALITY** agent for story {{story_key}}.
## Context
- **Story:** {{story_file}}
- **Builder completion:** {{builder_completion_artifact}}
## Objective
Review test files for quality and completeness:
1. Find all test files created/modified by Builder
2. For each test file, verify:
- **Happy path**: Primary functionality tested ✓
- **Edge cases**: null, empty, invalid inputs ✓
- **Error conditions**: Failures handled properly ✓
- **Assertions**: Meaningful checks (not just "doesn't crash")
- **Test names**: Descriptive and clear
- **Deterministic**: No random data, no timing dependencies
3. Check that tests actually validate the feature
**Focus on:** What's missing? What edge cases weren't considered?
## Success Criteria
- [ ] All test files reviewed
- [ ] Edge cases identified (covered or missing)
- [ ] Error conditions verified
- [ ] Assertions are meaningful
- [ ] Tests are deterministic
- [ ] Return quality assessment
## Completion Format
Return structured JSON artifact:
```json
{
"agent": "test_quality",
"story_key": "{{story_key}}",
"verdict": "PASS" | "NEEDS_IMPROVEMENT",
"test_files_reviewed": ["path/to/test.tsx", ...],
"issues": [
{
"severity": "HIGH",
"file": "path/to/test.tsx:45",
"issue": "Missing edge case: empty input array",
"recommendation": "Add test: expect(fn([])).toThrow(...)"
},
{
"severity": "MEDIUM",
"file": "path/to/test.tsx:67",
"issue": "Test uses Math.random() - could be flaky",
"recommendation": "Use fixed test data"
}
],
"coverage_analysis": {
"edge_cases_covered": true,
"error_conditions_tested": true,
"meaningful_assertions": true,
"tests_are_deterministic": true
},
"summary": {
"high_issues": 0,
"medium_issues": 0,
"low_issues": 0
}
}
```
Save to: `docs/sprint-artifacts/completions/{{story_key}}-test-quality.json`

View File

@ -1,74 +1,142 @@
# Super-Dev-Pipeline v3.1 - Token-Efficient Multi-Agent Pipeline # Story-Full-Pipeline v4.0 - Enhanced Multi-Agent Pipeline
<purpose> <purpose>
Implement a story using parallel verification agents with Builder context reuse. Implement a story using parallel verification agents with Builder context reuse.
Each agent has single responsibility. Builder fixes issues in its own context (50-70% token savings). Enhanced with playbook learning, code citation evidence, test quality validation, and automated coverage gates.
Orchestrator handles bookkeeping (story file updates, verification). Builder fixes issues in its own context (50-70% token savings).
</purpose> </purpose>
<philosophy> <philosophy>
**Token-Efficient Multi-Agent Pipeline** **Quality Through Discipline, Continuous Learning**
- Builder implements (creative, context preserved) - Playbook Query: Load relevant patterns before starting
- Inspector + Reviewers validate in parallel (verification, fresh context) - Builder: Implements with playbook knowledge (context preserved)
- Builder fixes issues (creative, reuses context - 50-70% token savings) - Inspector + Test Quality + Reviewers: Validate in parallel with proof
- Inspector re-checks (verification, quick check) - Coverage Gate: Automated threshold enforcement
- Orchestrator reconciles story file (mechanical) - Builder: Fixes issues in same context (50-70% token savings)
- Inspector: Quick recheck
- Orchestrator: Reconciles mechanically
- Reflection: Updates playbooks for future agents
**Key Innovation:** Resume Builder instead of spawning fresh Fixer. Trust but verify. Fresh context for verification. Evidence-based validation. Self-improving system.
Builder already knows the codebase - just needs to fix specific issues.
Trust but verify. Fresh context for verification. Reuse context for fixes.
</philosophy> </philosophy>
<config> <config>
name: story-full-pipeline name: story-full-pipeline
version: 3.2.0 version: 4.0.0
execution_mode: multi_agent execution_mode: multi_agent
phases: phases:
phase_0: Playbook Query (orchestrator)
phase_1: Builder (saves agent_id) phase_1: Builder (saves agent_id)
phase_2: [Inspector + N Reviewers] in parallel (N = 2/3/4 based on complexity) phase_2: [Inspector + Test Quality + N Reviewers] in parallel
phase_2.5: Coverage Gate (automated)
phase_3: Resume Builder with all findings (reuses context) phase_3: Resume Builder with all findings (reuses context)
phase_4: Inspector re-check (quick verification) phase_4: Inspector re-check (quick verification)
phase_5: Orchestrator reconciliation phase_5: Orchestrator reconciliation
phase_6: Playbook Reflection
reviewer_counts: reviewer_counts:
micro: 2 reviewers (security, architect/integration) v3.2.0+ micro: 2 reviewers (security, architect/integration)
standard: 3 reviewers (security, logic/performance, architect/integration) v3.2.0+ standard: 3 reviewers (security, logic/performance, architect/integration)
complex: 4 reviewers (security, logic, architect/integration, code quality) v3.2.0+ complex: 4 reviewers (security, logic, architect/integration, code quality)
quality_gates:
coverage_threshold: 80 # % line coverage required
task_verification: "all_with_evidence" # Inspector must cite file:line
critical_issues: "must_fix"
high_issues: "must_fix"
token_efficiency: token_efficiency:
- Phase 2 agents spawn in parallel (same cost, faster) - Phase 2 agents spawn in parallel (same cost, faster)
- Phase 3 resumes Builder (50-70% token savings vs fresh Fixer) - Phase 3 resumes Builder (50-70% token savings vs fresh agent)
- Phase 4 Inspector only (no full re-review) - Phase 4 Inspector only (no full re-review)
playbooks:
enabled: true
directory: "docs/playbooks/implementation-playbooks"
max_load: 3
auto_apply_updates: false
</config> </config>
<execution_context> <execution_context>
@patterns/hospital-grade.md @patterns/verification.md
@patterns/tdd.md
@patterns/agent-completion.md @patterns/agent-completion.md
</execution_context> </execution_context>
<process> <process>
<step name="load_story" priority="first"> <step name="load_story" priority="first">
Load and validate the story file. **Load and parse story file**
\`\`\`bash \`\`\`bash
STORY_FILE="docs/sprint-artifacts/{{story_key}}.md" STORY_FILE="docs/sprint-artifacts/{{story_key}}.md"
[ -f "$STORY_FILE" ] || { echo "ERROR: Story file not found"; exit 1; } [ -f "$STORY_FILE" ] || { echo "ERROR: Story file not found"; exit 1; }
\`\`\` \`\`\`
Use Read tool on the story file. Parse: Use Read tool. Extract:
- Complexity level (micro/standard/complex)
- Task count - Task count
- Acceptance criteria count - Acceptance criteria count
- Keywords for risk scoring
Determine which agents to spawn based on complexity routing. **Determine complexity:**
\`\`\`bash
TASK_COUNT=$(grep -c "^- \[ \]" "$STORY_FILE")
RISK_KEYWORDS=$(grep -ciE "auth|security|payment|encryption|migration|database" "$STORY_FILE")
if [ "$TASK_COUNT" -le 3 ] && [ "$RISK_KEYWORDS" -eq 0 ]; then
COMPLEXITY="micro"
REVIEWER_COUNT=2
elif [ "$TASK_COUNT" -ge 16 ] || [ "$RISK_KEYWORDS" -gt 0 ]; then
COMPLEXITY="complex"
REVIEWER_COUNT=4
else
COMPLEXITY="standard"
REVIEWER_COUNT=3
fi
\`\`\`
Determine agents to spawn: Inspector + Test Quality + $REVIEWER_COUNT Reviewers
</step>
<step name="query_playbooks">
**Phase 0: Playbook Query**
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📚 PHASE 0: PLAYBOOK QUERY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
**Extract story keywords:**
\`\`\`bash
STORY_KEYWORDS=$(grep -E "^## Story Title|^### Feature|^## Business Context" "$STORY_FILE" | sed 's/[#]//g' | tr '\n' ' ')
echo "Story keywords: $STORY_KEYWORDS"
\`\`\`
**Search for relevant playbooks:**
Use Grep tool:
- Pattern: extracted keywords
- Path: \`docs/playbooks/implementation-playbooks/\`
- Output mode: files_with_matches
- Limit: 3 files
**Load matching playbooks:**
For each playbook found:
- Use Read tool
- Extract sections: Common Gotchas, Code Patterns, Test Requirements
If no playbooks exist:
\`\`\`
No playbooks found - this will be the first story to create them
\`\`\`
Store playbook content for Builder.
</step> </step>
<step name="spawn_builder"> <step name="spawn_builder">
**Phase 1: Builder Agent (Steps 1-4)** **Phase 1: Builder Agent**
\`\`\` \`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
@ -76,41 +144,359 @@ Determine which agents to spawn based on complexity routing.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\` \`\`\`
Spawn Builder agent and save agent_id for later resume. Spawn Builder agent and **SAVE agent_id for resume later**:
**CRITICAL: Save Builder's agent_id for later resume**
\`\`\` \`\`\`
BUILDER_AGENT_ID={{agent_id_from_task_result}} BUILDER_TASK = Task({
echo "Builder agent: $BUILDER_AGENT_ID" subagent_type: "general-purpose",
description: "Implement story {{story_key}}",
prompt: \`
You are the BUILDER agent for story {{story_key}}.
<execution_context>
@patterns/tdd.md
@patterns/agent-completion.md
</execution_context>
<context>
Story: [inline story file content]
{{IF playbooks loaded}}
Relevant Playbooks (review before implementing):
[inline playbook content]
Pay special attention to:
- Common Gotchas in these playbooks
- Code Patterns to follow
- Test Requirements to satisfy
{{ENDIF}}
</context>
<objective>
Implement the story requirements:
1. Review story tasks and acceptance criteria
2. **Review playbooks** for gotchas and patterns (if provided)
3. Analyze what exists vs needed (gap analysis)
4. **Write tests FIRST** (TDD - tests before implementation)
5. Implement production code to pass tests
</objective>
<constraints>
- DO NOT validate your own work
- DO NOT review your code
- DO NOT update story checkboxes
- DO NOT commit changes yet
</constraints>
<success_criteria>
- [ ] Reviewed playbooks for guidance
- [ ] Tests written for all requirements
- [ ] Production code implements tests
- [ ] Tests pass
- [ ] Return structured completion artifact
</success_criteria>
<completion_format>
Return structured JSON artifact:
{
"agent": "builder",
"story_key": "{{story_key}}",
"status": "SUCCESS" | "FAILED",
"files_created": ["path/to/file.tsx", ...],
"files_modified": ["path/to/file.tsx", ...],
"tests_added": {
"total": 12,
"passing": 12
},
"tasks_addressed": ["task description from story", ...]
}
Save to: docs/sprint-artifacts/completions/{{story_key}}-builder.json
</completion_format>
\`
})
BUILDER_AGENT_ID = {{extract agent_id from Task result}}
\`\`\` \`\`\`
Wait for completion. Parse structured output. Verify files exist. **CRITICAL: Store Builder agent ID:**
\`\`\`bash
echo "Builder agent ID: $BUILDER_AGENT_ID"
echo "$BUILDER_AGENT_ID" > /tmp/builder-agent-id.txt
\`\`\`
**Wait for completion. Verify artifact exists:**
\`\`\`bash
BUILDER_COMPLETION="docs/sprint-artifacts/completions/{{story_key}}-builder.json"
[ -f "$BUILDER_COMPLETION" ] || { echo "❌ No builder artifact"; exit 1; }
\`\`\`
**Verify files exist:**
\`\`\`bash
# For each file in files_created and files_modified:
[ -f "$file" ] || echo "❌ MISSING: $file"
\`\`\`
If files missing or status FAILED: halt pipeline. If files missing or status FAILED: halt pipeline.
</step> </step>
<step name="spawn_verification_parallel"> <step name="spawn_verification_parallel">
**Phase 2: Parallel Verification (Inspector + Reviewers)** **Phase 2: Parallel Verification (Inspector + Test Quality + Reviewers)**
\`\`\` \`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔍 PHASE 2: PARALLEL VERIFICATION 🔍 PHASE 2: PARALLEL VERIFICATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Spawning: Inspector + Test Quality + {{REVIEWER_COUNT}} Reviewers
Total agents: {{2 + REVIEWER_COUNT}}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\` \`\`\`
**CRITICAL: Spawn ALL verification agents in ONE message (parallel execution)** **CRITICAL: Spawn ALL agents in ONE message (parallel execution)**
Send single message with multiple Task calls:
1. Inspector Agent
2. Test Quality Agent
3. Security Reviewer
4. Logic/Performance Reviewer (if standard/complex)
5. Architect/Integration Reviewer
6. Code Quality Reviewer (if complex)
---
## Inspector Agent Prompt:
Determine reviewer count based on complexity:
\`\`\` \`\`\`
if complexity == "micro": REVIEWER_COUNT = 1 Task({
if complexity == "standard": REVIEWER_COUNT = 2 subagent_type: "general-purpose",
if complexity == "complex": REVIEWER_COUNT = 3 description: "Validate story {{story_key}} implementation",
prompt: \`
You are the INSPECTOR agent for story {{story_key}}.
<execution_context>
@patterns/verification.md
@patterns/agent-completion.md
</execution_context>
<context>
Story: [inline story file content]
</context>
<objective>
Independently verify implementation WITH CODE CITATIONS:
1. Read story file - understand ALL tasks
2. Read each file Builder created/modified
3. **Map EACH task to specific code with file:line citations**
4. Run verification checks:
- Type-check (0 errors required)
- Lint (0 warnings required)
- Tests (all passing required)
- Build (success required)
</objective>
<critical_requirement>
**EVERY task must have evidence.**
For each task, provide:
- file:line where it's implemented
- Brief quote of relevant code
- Verdict: IMPLEMENTED or NOT_IMPLEMENTED
Example:
Task: "Display occupant agreement status"
Evidence: src/features/agreement/StatusBadge.tsx:45-67
Code: "const StatusBadge = ({ status }) => ..."
Verdict: IMPLEMENTED
</critical_requirement>
<constraints>
- You have NO KNOWLEDGE of what Builder did
- Run all checks yourself - don't trust claims
- **Every task needs file:line citation**
- If code doesn't exist: mark NOT IMPLEMENTED with reason
</constraints>
<success_criteria>
- [ ] ALL tasks mapped to code locations
- [ ] Type check: 0 errors
- [ ] Lint: 0 warnings
- [ ] Tests: all passing
- [ ] Build: success
- [ ] Return structured evidence
</success_criteria>
<completion_format>
{
"agent": "inspector",
"story_key": "{{story_key}}",
"verdict": "PASS" | "FAIL",
"task_verification": [
{
"task": "Create agreement view component",
"implemented": true,
"evidence": [
{
"file": "src/features/agreement/AgreementView.tsx",
"lines": "15-67",
"code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
},
{
"file": "src/features/agreement/AgreementView.test.tsx",
"lines": "8-45",
"code_snippet": "describe('AgreementView', () => {...})"
}
]
},
{
"task": "Add status badge",
"implemented": false,
"evidence": [],
"reason": "No StatusBadge component found in src/features/agreement/"
}
],
"checks": {
"type_check": {"passed": true, "errors": 0},
"lint": {"passed": true, "warnings": 0},
"tests": {"passed": true, "total": 12, "passing": 12},
"build": {"passed": true}
}
}
Save to: docs/sprint-artifacts/completions/{{story_key}}-inspector.json
</completion_format>
\`
})
\`\`\` \`\`\`
Spawn Inspector + N Reviewers in single message. Wait for ALL agents to complete. Collect findings. ---
Aggregate all findings from Inspector + Reviewers. ## Test Quality Agent Prompt:
\`\`\`
Task({
subagent_type: "general-purpose",
description: "Review test quality for {{story_key}}",
prompt: \`
You are the TEST QUALITY agent for story {{story_key}}.
<context>
Story: [inline story file content]
Builder completion: [inline builder artifact]
</context>
<objective>
Review test files for quality and completeness:
1. Find all test files created/modified by Builder
2. For each test file, verify:
- **Happy path**: Primary functionality tested ✓
- **Edge cases**: null, empty, invalid inputs ✓
- **Error conditions**: Failures handled properly ✓
- **Assertions**: Meaningful checks (not just "doesn't crash")
- **Test names**: Descriptive and clear
- **Deterministic**: No random data, no timing dependencies
3. Check that tests actually validate the feature
**Focus on:** What's missing? What edge cases weren't considered?
</objective>
<success_criteria>
- [ ] All test files reviewed
- [ ] Edge cases identified (covered or missing)
- [ ] Error conditions verified
- [ ] Assertions are meaningful
- [ ] Tests are deterministic
- [ ] Return quality assessment
</success_criteria>
<completion_format>
{
"agent": "test_quality",
"story_key": "{{story_key}}",
"verdict": "PASS" | "NEEDS_IMPROVEMENT",
"test_files_reviewed": ["path/to/test.tsx", ...],
"issues": [
{
"severity": "HIGH",
"file": "path/to/test.tsx:45",
"issue": "Missing edge case: empty input array",
"recommendation": "Add test: expect(fn([])).toThrow(...)"
},
{
"severity": "MEDIUM",
"file": "path/to/test.tsx:67",
"issue": "Test uses Math.random() - could be flaky",
"recommendation": "Use fixed test data"
}
],
"coverage_analysis": {
"edge_cases_covered": true | false,
"error_conditions_tested": true | false,
"meaningful_assertions": true | false,
"tests_are_deterministic": true | false
},
"summary": {
"high_issues": 1,
"medium_issues": 2,
"low_issues": 0
}
}
Save to: docs/sprint-artifacts/completions/{{story_key}}-test-quality.json
</completion_format>
\`
})
\`\`\`
---
(Continue with Security, Logic, Architect, Quality reviewers as before...)
**Wait for ALL agents to complete.**
Collect completion artifacts:
- \`inspector.json\`
- \`test-quality.json\`
- \`reviewer-security.json\`
- \`reviewer-logic.json\` (if spawned)
- \`reviewer-architect.json\`
- \`reviewer-quality.json\` (if spawned)
Parse all findings and aggregate by severity.
</step>
<step name="coverage_gate">
**Phase 2.5: Coverage Gate (Automated)**
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 PHASE 2.5: COVERAGE GATE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
Run coverage check:
\`\`\`bash
# Run tests with coverage
npm test -- --coverage --silent 2>&1 | tee coverage-output.txt
# Extract coverage percentage (adjust grep pattern for your test framework)
COVERAGE=$(grep -E "All files|Statements" coverage-output.txt | head -1 | grep -oE "[0-9]+\.[0-9]+|[0-9]+" | head -1 || echo "0")
echo "Coverage: ${COVERAGE}%"
echo "Threshold: {{coverage_threshold}}%"
# Compare coverage
if (( $(echo "$COVERAGE < {{coverage_threshold}}" | bc -l) )); then
echo "❌ Coverage ${COVERAGE}% below threshold {{coverage_threshold}}%"
echo "Builder must add more tests before proceeding"
exit 1
fi
echo "✅ Coverage gate passed: ${COVERAGE}%"
\`\`\`
If coverage fails: add to issues list for Builder to fix.
</step> </step>
<step name="resume_builder_with_findings"> <step name="resume_builder_with_findings">
@ -156,68 +542,274 @@ If PASS: Proceed to reconciliation.
\`\`\` \`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔧 PHASE 5: RECONCILIATION (Orchestrator) 📊 PHASE 5: RECONCILIATION (Orchestrator)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\` \`\`\`
**YOU (orchestrator) do this directly. No agent spawn.** **YOU (orchestrator) do this directly. No agent spawn.**
1. Get what was built (git log, git diff) **5.1: Load completion artifacts**
2. Read story file \`\`\`bash
3. Check off completed tasks (Edit tool) BUILDER_FIXES="docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json"
4. Fill Dev Agent Record with pipeline details INSPECTOR="docs/sprint-artifacts/completions/{{story_key}}-inspector.json"
5. Verify updates (grep task checkboxes) \`\`\`
6. Update sprint-status.yaml to "done"
Use Read tool on all artifacts.
**5.2: Read story file**
Use Read tool: \`docs/sprint-artifacts/{{story_key}}.md\`
**5.3: Check off completed tasks using Inspector evidence**
For each task in \`inspector.task_verification\`:
- If \`implemented: true\` and has evidence:
- Use Edit tool: \`"- [ ] {{task}}"\` → \`"- [x] {{task}}"\`
**5.4: Fill Dev Agent Record with evidence**
Use Edit tool:
\`\`\`markdown
### Dev Agent Record
**Implementation Date:** {{timestamp}}
**Agent Model:** Claude Sonnet 4.5 (multi-agent pipeline v4.0)
**Git Commit:** {{git_commit}}
**Pipeline Phases:**
- Phase 0: Playbook Query ({{playbooks_loaded}} loaded)
- Phase 1: Builder (initial implementation)
- Phase 2: Parallel Verification
- Inspector: {{verdict}} with code citations
- Test Quality: {{verdict}}
- {{REVIEWER_COUNT}} Reviewers: {{issues_found}}
- Phase 2.5: Coverage Gate ({{coverage}}%)
- Phase 3: Builder (resumed, fixed {{fixes_count}} issues)
- Phase 4: Inspector re-check ({{verdict}})
**Files Created:** {{count}}
**Files Modified:** {{count}}
**Tests:** {{tests.passing}}/{{tests.total}} passing ({{coverage}}%)
**Issues Fixed:** {{critical}} CRITICAL, {{high}} HIGH, {{medium}} MEDIUM
**Task Evidence:** (Inspector code citations)
{{for each task with evidence}}
- [x] {{task}}
- {{evidence[0].file}}:{{evidence[0].lines}}
{{endfor}}
\`\`\`
**5.5: Verify updates**
\`\`\`bash
CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
[ "$CHECKED" -gt 0 ] || { echo "❌ Zero tasks checked"; exit 1; }
echo "✅ Reconciled: $CHECKED tasks with evidence"
\`\`\`
</step> </step>
<step name="final_verification"> <step name="final_verification">
**Final Quality Gate** **Final Quality Gate**
Verify: \`\`\`bash
1. Git commit exists echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
2. Story tasks checked (count > 0) echo "🔍 FINAL VERIFICATION"
3. Dev Agent Record filled echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
4. Sprint status updated
If verification fails: fix using Edit, then re-verify. # 1. Git commit exists
git log --oneline -3 | grep "{{story_key}}" || { echo "❌ No commit"; exit 1; }
echo "✅ Git commit found"
# 2. Story tasks checked with evidence
CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
[ "$CHECKED" -gt 0 ] || { echo "❌ No tasks checked"; exit 1; }
echo "✅ $CHECKED tasks checked with code citations"
# 3. Dev Agent Record filled
grep -A 5 "### Dev Agent Record" docs/sprint-artifacts/{{story_key}}.md | grep -q "202" || { echo "❌ Record not filled"; exit 1; }
echo "✅ Dev Agent Record filled"
# 4. Coverage met threshold
FINAL_COVERAGE=$(jq -r '.tests.coverage' docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json)
if (( $(echo "$FINAL_COVERAGE < {{coverage_threshold}}" | bc -l) )); then
echo "❌ Coverage ${FINAL_COVERAGE}% still below threshold"
exit 1
fi
echo "✅ Coverage: ${FINAL_COVERAGE}%"
echo ""
echo "✅ STORY COMPLETE"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
\`\`\`
**Update sprint-status.yaml:**
Use Edit tool: \`"{{story_key}}: ready-for-dev"\` → \`"{{story_key}}: done"\`
</step>
<step name="playbook_reflection">
**Phase 6: Playbook Reflection**
\`\`\`
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 PHASE 6: PLAYBOOK REFLECTION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
\`\`\`
Spawn Reflection Agent:
\`\`\`
Task({
subagent_type: "general-purpose",
description: "Extract learnings from {{story_key}}",
prompt: \`
You are the REFLECTION agent for story {{story_key}}.
<context>
Story: [inline story file]
Builder initial: [inline builder.json]
All review findings: [inline all reviewer artifacts]
Builder fixes: [inline builder-fixes.json]
Test quality issues: [inline test-quality.json]
</context>
<objective>
Identify what future agents should know:
1. **What issues were found?** (from reviewers)
2. **What did Builder miss initially?** (gaps, edge cases, security)
3. **What playbook knowledge would have prevented these?**
4. **Which module/feature area does this apply to?**
5. **Should we update existing playbook or create new?**
Questions:
- What gotchas should future builders know?
- What code patterns should be standard?
- What test requirements are essential?
- What similar stories exist?
</objective>
<success_criteria>
- [ ] Analyzed review findings
- [ ] Identified preventable issues
- [ ] Determined which playbook(s) to update
- [ ] Return structured proposal
</success_criteria>
<completion_format>
{
"agent": "reflection",
"story_key": "{{story_key}}",
"learnings": [
{
"issue": "SQL injection in query builder",
"root_cause": "Builder used string concatenation (didn't know pattern)",
"prevention": "Playbook should document: always use parameterized queries",
"applies_to": "database queries, API endpoints with user input"
},
{
"issue": "Missing edge case tests for empty arrays",
"root_cause": "Test Quality Agent found gap",
"prevention": "Playbook should require: test null/empty/invalid for all inputs",
"applies_to": "all data processing functions"
}
],
"playbook_proposal": {
"action": "update_existing" | "create_new",
"playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
"module": "api/database",
"updates": {
"common_gotchas": [
"Never concatenate user input into SQL - use parameterized queries",
"Test edge cases: null, undefined, [], '', invalid input"
],
"code_patterns": [
"db.query(sql, [param1, param2]) ✓",
"sql + userInput ✗"
],
"test_requirements": [
"Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
"Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
],
"related_stories": ["{{story_key}}"]
}
}
}
Save to: docs/sprint-artifacts/completions/{{story_key}}-reflection.json
</completion_format>
\`
})
\`\`\`
**Wait for completion.**
**Review playbook proposal:**
\`\`\`bash
REFLECTION="docs/sprint-artifacts/completions/{{story_key}}-reflection.json"
ACTION=$(jq -r '.playbook_proposal.action' "$REFLECTION")
PLAYBOOK=$(jq -r '.playbook_proposal.playbook' "$REFLECTION")
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "📝 Playbook Update Proposal"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Action: $ACTION"
echo "Playbook: $PLAYBOOK"
echo ""
jq -r '.learnings[] | "- \(.issue)\n Prevention: \(.prevention)"' "$REFLECTION"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
\`\`\`
If \`auto_apply_updates: true\` in config:
- Read playbook (or create from template if new)
- Use Edit tool to add learnings to sections
- Commit playbook update
If \`auto_apply_updates: false\` (default):
- Display proposal for manual review
- User can apply later with \`/update-playbooks {{story_key}}\`
</step> </step>
</process> </process>
<failure_handling> <failure_handling>
**Builder fails:** Don't spawn verification. Report failure and halt. **Builder fails (Phase 1):** Don't spawn verification. Report failure and halt.
**Inspector fails (Phase 2):** Still run Reviewers in parallel, collect all findings together. **Inspector fails (Phase 2):** Still collect other reviewer findings.
**Inspector fails (Phase 4):** Resume Builder again with new issues (iterative fix loop). **Test Quality fails:** Add issues to Builder fix list.
**Builder resume fails:** Report unfixed issues. Manual intervention needed. **Coverage below threshold:** Add to Builder fix list.
**Reconciliation fails:** Fix using Edit tool. Re-verify checkboxes. **Reviewers find CRITICAL:** Builder MUST fix when resumed.
**Inspector fails (Phase 4):** Resume Builder again (iterative loop, max 3 iterations).
**Builder resume fails:** Report unfixed issues. Manual intervention.
**Reconciliation fails:** Fix with Edit tool, re-verify.
</failure_handling> </failure_handling>
<complexity_routing> <complexity_routing>
| Complexity | Pipeline | Reviewers | Total Phase 2 Agents | | Complexity | Phase 2 Agents | Total | Security |
|------------|----------|-----------|---------------------| |------------|----------------|-------|----------|
| micro | Builder → [Inspector + 2 Reviewers] → Resume Builder → Inspector recheck | 2 (security, architect) | 3 agents | | micro | Inspector + Test Quality + 2 Reviewers | 4 agents | Security Reviewer + Architect |
| standard | Builder → [Inspector + 3 Reviewers] → Resume Builder → Inspector recheck | 3 (security, logic, architect) | 4 agents | | standard | Inspector + Test Quality + 3 Reviewers | 5 agents | Security + Logic + Architect |
| complex | Builder → [Inspector + 4 Reviewers] → Resume Builder → Inspector recheck | 4 (security, logic, architect, quality) | 5 agents | | complex | Inspector + Test Quality + 4 Reviewers | 6 agents | Security + Logic + Architect + Quality |
**Key Improvements (v3.2.0):** **All verification agents spawn in parallel (single message)**
- All verification agents spawn in parallel (single message, faster execution)
- Builder resume in Phase 3 saves 50-70% tokens vs spawning fresh Fixer
- **NEW:** Architect/Integration Reviewer catches runtime issues (404s, pattern violations, missing migrations)
**Reviewer Specializations:**
- **Security:** Auth, injection, secrets, cross-tenant access
- **Logic/Performance:** Bugs, edge cases, N+1 queries, race conditions
- **Architect/Integration:** Routes work, patterns match, migrations applied, dependencies installed (v3.2.0+)
- **Code Quality:** Maintainability, naming, duplication (complex only)
</complexity_routing> </complexity_routing>
<success_criteria> <success_criteria>
- [ ] Builder spawned and agent_id saved - [ ] Phase 0: Playbooks loaded (if available)
- [ ] All verification agents completed in parallel - [ ] Phase 1: Builder spawned, agent_id saved
- [ ] Builder resumed with consolidated findings - [ ] Phase 2: All verification agents completed in parallel
- [ ] Inspector recheck passed - [ ] Phase 2.5: Coverage gate passed
- [ ] Git commit exists for story - [ ] Phase 3: Builder resumed with consolidated findings
- [ ] Story file has checked tasks (count > 0) - [ ] Phase 4: Inspector recheck passed
- [ ] Dev Agent Record filled with all phases - [ ] Phase 5: Orchestrator reconciled with Inspector evidence
- [ ] Sprint status updated to "done" - [ ] Phase 6: Playbook reflection completed
- [ ] Git commit exists
- [ ] Story tasks checked with code citations
- [ ] Dev Agent Record filled
- [ ] Coverage ≥ {{coverage_threshold}}%
- [ ] Sprint status: done
</success_criteria> </success_criteria>
<improvements_v4>
1. ✅ Resume Builder for fixes (v3.2+) - 50-70% token savings
2. ✅ Inspector provides code citations (v4.0) - file:line evidence for every task
3. ✅ Removed "hospital-grade" framing (v4.0) - kept disciplined gates
4. ✅ Micro stories get 2 reviewers + security scan (v3.2+) - not zero
5. ✅ Test Quality Agent (v4.0) + Coverage Gate (v4.0) - validates test quality and enforces threshold
6. ✅ Playbook query (v4.0) before Builder + reflection (v4.0) after - continuous learning
</improvements_v4>

View File

@ -1,7 +1,7 @@
name: story-full-pipeline name: story-full-pipeline
description: "Multi-agent pipeline with wave-based execution, independent validation, and adversarial code review (GSDMAD)" description: "Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes"
author: "BMAD Method + GSD" author: "BMAD Method"
version: "3.2.0" # Added architect-integration-reviewer for runtime verification version: "4.0.0" # Added playbook learning, test quality, coverage gates, Inspector code citations
# Execution mode # Execution mode
execution_mode: "multi_agent" # multi_agent | single_agent (fallback) execution_mode: "multi_agent" # multi_agent | single_agent (fallback)
@ -37,13 +37,23 @@ agents:
timeout: 3600 # 1 hour timeout: 3600 # 1 hour
inspector: inspector:
description: "Validation agent - independent verification" description: "Validation agent - independent verification with code citations"
steps: [5, 6] steps: [5, 6]
subagent_type: "general-purpose" subagent_type: "general-purpose"
prompt_file: "{agents_path}/inspector.md" prompt_file: "{agents_path}/inspector.md"
fresh_context: true # No knowledge of builder agent fresh_context: true # No knowledge of builder agent
trust_level: "medium" # No conflict of interest trust_level: "medium" # No conflict of interest
timeout: 1800 # 30 minutes timeout: 1800 # 30 minutes
require_code_citations: true # v4.0: Must provide file:line evidence for all tasks
test_quality:
description: "Test quality validation - verifies test coverage and quality"
steps: [5.5]
subagent_type: "general-purpose"
prompt_file: "{agents_path}/test-quality.md"
fresh_context: true
trust_level: "medium"
timeout: 1200 # 20 minutes
reviewer: reviewer:
description: "Adversarial code review - finds problems" description: "Adversarial code review - finds problems"
@ -73,15 +83,40 @@ agents:
trust_level: "medium" # Incentive to minimize work trust_level: "medium" # Incentive to minimize work
timeout: 2400 # 40 minutes timeout: 2400 # 40 minutes
reflection:
description: "Playbook learning - extracts patterns for future agents"
steps: [10]
subagent_type: "general-purpose"
prompt_file: "{agents_path}/reflection.md"
timeout: 900 # 15 minutes
# Reconciliation: orchestrator does this directly (see workflow.md Phase 5) # Reconciliation: orchestrator does this directly (see workflow.md Phase 5)
# Playbook configuration (v4.0)
playbooks:
enabled: true # Set to false in project config to disable
directory: "docs/playbooks/implementation-playbooks"
bootstrap_mode: true # Auto-initialize if missing
max_load: 3
auto_apply_updates: false # Require manual review of playbook updates
discovery:
enabled: true # Scan git/docs to populate initial playbooks
sources: ["git_history", "docs", "existing_code"]
# Quality gates (v4.0)
quality_gates:
coverage_threshold: 80 # % line coverage required
task_verification: "all_with_evidence" # Inspector must provide file:line citations
critical_issues: "must_fix"
high_issues: "must_fix"
# Complexity level (determines which steps to execute) # Complexity level (determines which steps to execute)
complexity_level: "standard" # micro | standard | complex complexity_level: "standard" # micro | standard | complex
# Complexity routing # Complexity routing
complexity_routing: complexity_routing:
micro: micro:
skip_agents: ["reviewer"] # Skip code review for micro stories skip_agents: [] # Full pipeline (v4.0: micro gets security scan)
description: "Lightweight path for low-risk stories" description: "Lightweight path for low-risk stories"
examples: ["UI tweaks", "text changes", "simple CRUD"] examples: ["UI tweaks", "text changes", "simple CRUD"]

View File

@ -0,0 +1,85 @@
# {{Module/Feature Area}} - Implementation Playbook
> **Purpose:** Guide future agents implementing features in {{module_name}}
> **Created:** {{date}}
> **Last Updated:** {{date}}
## Common Gotchas
**What mistakes to avoid:**
- Add specific gotchas here as they're discovered
- Example: "Never concatenate user input into SQL queries"
- Example: "Always validate file paths before operations"
## Code Patterns
**Standard approaches that work:**
### Pattern: {{Pattern Name}}
✓ **Good:**
```
// Example of correct pattern
db.query(sql, [param1, param2])
```
✗ **Bad:**
```
// Example of incorrect pattern
sql + userInput
```
### Pattern: {{Another Pattern}}
✓ **Good:**
```
// Another example
if (!data) return null;
```
✗ **Bad:**
```
// Don't do this
data.map(...) // crashes if data is null
```
## Test Requirements
**Essential tests for this module:**
- **Happy path:** Verify primary functionality
- **Edge cases:** Test null, undefined, empty arrays, invalid inputs
- **Error conditions:** Verify errors are handled properly
- **Security:** Test for injection attacks, auth bypasses, etc.
### Example Test Pattern
```typescript
describe('FeatureName', () => {
it('handles happy path', () => {
expect(fn(validInput)).toEqual(expected)
})
it('handles edge cases', () => {
expect(fn(null)).toThrow()
expect(fn([])).toEqual([])
})
it('validates security', () => {
expect(fn("' OR 1=1--")).toThrow()
})
})
```
## Related Stories
Stories that used these patterns:
- {{story_key}} - {{brief description}}
## Notes
- Keep this simple and actionable
- Add new learnings as they emerge
- Focus on preventable mistakes