feat: upgrade story-full-pipeline to v4.0 with 6 major enhancements
Upgrade from v3.2.0 to v4.0.0 with improvements inspired by CooperBench research (Stanford/SAP 2026) on agent coordination failures. Enhancement 1: Resume Builder (v3.2+) - Phase 3 RESUMES Builder agent with review findings - Builder already has full codebase context (50-70% token savings) - More efficient than spawning fresh Fixer agent Enhancement 2: Inspector Code Citations (v4.0) - Inspector must map EVERY task to file:line citations - Example: "Create component" → "src/Component.tsx:45-67" - No more "trust me, it works" - requires proof - Returns structured JSON with code evidence per task - Prevents vague communication (CooperBench finding) Enhancement 3: Remove Hospital-Grade Framing (v4.0) - Dropped psychological appeal language - Kept rigorous verification gates and bash checks - Focus on concrete, measurable verification - Replaced with patterns/verification.md + patterns/tdd.md Enhancement 4: Micro Stories Get Security Scan (v4.0) - No longer skip ALL review for micro stories - Micro now gets 2 reviewers: Security + Architect - Lightweight but still catches critical vulnerabilities Enhancement 5: Test Quality Agent + Coverage Gate (v4.0) - New Test Quality Agent validates: - Edge cases covered (null, empty, invalid) - Error conditions tested - Meaningful assertions (not just "doesn't crash") - No flaky tests (random data, timing) - Automated Coverage Gate enforces 80% threshold - Builder must fix test gaps before proceeding Enhancement 6: Playbook Learning System (v4.0) - Phase 0: Query playbooks before implementation - Builder gets relevant patterns/gotchas upfront - Phase 6: Reflection agent extracts learnings - Auto-generates playbook updates for future agents - Bootstrap mode: auto-initializes playbooks if missing - Continuous improvement through reflection Pipeline: Phase 0 (Playbooks) → Phase 1 (Builder) → Phase 2 (Inspector + Test Quality + Reviewers parallel) → Phase 2.5 (Coverage Gate) → Phase 3 (Resume Builder) → Phase 4 (Inspector recheck) → Phase 5 (Reconciliation) → Phase 6 (Reflection) Files Modified: - workflow.yaml: v4.0 config with playbooks + quality_gates - workflow.md: Complete v4.0 documentation with all phases - agents/builder.md: Playbook awareness + structured JSON - agents/inspector.md: Code citation requirements + evidence format - agents/reviewer.md: Remove hospital-grade reference - agents/architect-integration-reviewer.md: Remove hospital-grade reference - agents/fixer.md: Remove hospital-grade reference - README.md: v4.0 documentation + CooperBench analysis Files Created: - agents/test-quality.md: Test quality validation agent - agents/reflection.md: Playbook learning agent - ../templates/implementation-playbook-template.md: Simple playbook structure Design Philosophy: The workflow avoids CooperBench's "curse of coordination" by using: - Sequential implementation (ONE writer, no merge conflicts) - Parallel verification (safe read-only validation) - Context reuse (no expectation failures) - Evidence-based communication (file:line citations) - Clear role separation (no overlapping responsibilities)
This commit is contained in:
parent
0810646ed6
commit
a268b4c1bc
|
|
@ -1,124 +1,150 @@
|
||||||
# Super-Dev Pipeline - GSDMAD Architecture
|
# Story-Full-Pipeline v4.0
|
||||||
|
|
||||||
**Multi-agent pipeline with independent validation and adversarial code review**
|
Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes.
|
||||||
|
|
||||||
---
|
## What's New in v4.0
|
||||||
|
|
||||||
## Quick Start
|
### 1. Resume Builder (v3.2+)
|
||||||
|
**Token Efficiency: 50-70% savings**
|
||||||
|
|
||||||
```bash
|
- Phase 3 now RESUMES Builder agent with review findings
|
||||||
# Run super-dev pipeline for a story
|
- Builder already has full codebase context
|
||||||
/story-full-pipeline story_key=17-10
|
- More efficient than spawning fresh Fixer agent
|
||||||
|
|
||||||
|
### 2. Inspector Code Citations (v4.0)
|
||||||
|
**Evidence-Based Verification**
|
||||||
|
|
||||||
|
- Inspector must map EVERY task to file:line citations
|
||||||
|
- Example: "Create component" → "src/Component.tsx:45-67"
|
||||||
|
- No more "trust me, it works" - requires proof
|
||||||
|
- Returns structured JSON with code evidence per task
|
||||||
|
|
||||||
|
### 3. Remove Hospital-Grade Framing (v4.0)
|
||||||
|
**Focus on Concrete Verification**
|
||||||
|
|
||||||
|
- Dropped psychological appeal language
|
||||||
|
- Kept rigorous verification gates and bash checks
|
||||||
|
- Replaced with patterns/verification.md + patterns/tdd.md
|
||||||
|
|
||||||
|
### 4. Micro Stories Get Security Scan (v4.0)
|
||||||
|
**Even Simple Stories Need Security**
|
||||||
|
|
||||||
|
- No longer skip ALL review for micro stories
|
||||||
|
- Still get 2 reviewers: Security + Architect
|
||||||
|
- Lightweight but catches critical vulnerabilities
|
||||||
|
|
||||||
|
### 5. Test Quality Agent + Coverage Gate (v4.0)
|
||||||
|
**Validate Test Completeness**
|
||||||
|
|
||||||
|
- New Test Quality Agent validates:
|
||||||
|
- Edge cases covered (null, empty, invalid)
|
||||||
|
- Error conditions tested
|
||||||
|
- Meaningful assertions (not just "doesn't crash")
|
||||||
|
- No flaky tests (random data, timing)
|
||||||
|
- Automated Coverage Gate enforces 80% threshold
|
||||||
|
- Builder must fix test gaps before proceeding
|
||||||
|
|
||||||
|
### 6. Playbook Learning System (v4.0)
|
||||||
|
**Continuous Improvement Through Reflection**
|
||||||
|
|
||||||
|
- **Phase 0:** Query playbooks before implementation
|
||||||
|
- Builder gets relevant patterns/gotchas upfront
|
||||||
|
- **Phase 6:** Reflection agent extracts learnings
|
||||||
|
- Auto-generates playbook updates for future agents
|
||||||
|
- Bootstrap mode: auto-initializes playbooks if missing
|
||||||
|
|
||||||
|
## Pipeline Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Phase 0: Playbook Query (orchestrator)
|
||||||
|
↓
|
||||||
|
Phase 1: Builder (initial implementation)
|
||||||
|
↓
|
||||||
|
Phase 2: Inspector + Test Quality + N Reviewers (parallel)
|
||||||
|
↓
|
||||||
|
Phase 2.5: Coverage Gate (automated)
|
||||||
|
↓
|
||||||
|
Phase 3: Resume Builder (fix issues with context)
|
||||||
|
↓
|
||||||
|
Phase 4: Inspector re-check (quick verification)
|
||||||
|
↓
|
||||||
|
Phase 5: Orchestrator reconciliation (evidence-based)
|
||||||
|
↓
|
||||||
|
Phase 6: Playbook Reflection (extract learnings)
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
## Complexity Routing
|
||||||
|
|
||||||
## Architecture
|
| Complexity | Phase 2 Agents | Total | Reviewers |
|
||||||
|
|------------|----------------|-------|-----------|
|
||||||
|
| micro | Inspector + Test Quality + 2 | 4 agents | Security + Architect |
|
||||||
|
| standard | Inspector + Test Quality + 3 | 5 agents | Security + Logic + Architect |
|
||||||
|
| complex | Inspector + Test Quality + 4 | 6 agents | Security + Logic + Architect + Quality |
|
||||||
|
|
||||||
### Multi-Agent Validation
|
## Quality Gates
|
||||||
- **4 independent agents** working sequentially
|
|
||||||
- Builder → Inspector → Reviewer → Fixer
|
|
||||||
- Each agent has fresh context
|
|
||||||
- No conflict of interest
|
|
||||||
|
|
||||||
### Honest Reporting
|
- **Coverage Threshold:** 80% line coverage required
|
||||||
- Inspector verifies Builder's work (doesn't trust claims)
|
- **Task Verification:** ALL tasks need file:line evidence
|
||||||
- Reviewer is adversarial (wants to find issues)
|
- **Critical Issues:** MUST fix
|
||||||
- Main orchestrator does final verification
|
- **High Issues:** MUST fix
|
||||||
- Can't fake completion
|
|
||||||
|
|
||||||
### Wave-Based Execution
|
## Token Efficiency
|
||||||
- Independent stories run in parallel
|
|
||||||
- Dependencies respected via waves
|
|
||||||
- 57% faster than sequential execution
|
|
||||||
|
|
||||||
---
|
- Phase 2 agents spawn in parallel (same cost, faster)
|
||||||
|
- Phase 3 resumes Builder (50-70% token savings vs fresh agent)
|
||||||
|
- Phase 4 Inspector only (no full re-review)
|
||||||
|
|
||||||
## Workflow Phases
|
## Playbook Configuration
|
||||||
|
|
||||||
**Phase 1: Builder (Steps 1-4)**
|
```yaml
|
||||||
- Load story, analyze gaps
|
playbooks:
|
||||||
- Write tests (TDD)
|
enabled: true
|
||||||
- Implement code
|
directory: "docs/playbooks/implementation-playbooks"
|
||||||
- Report what was built (NO VALIDATION)
|
bootstrap_mode: true # Auto-initialize if missing
|
||||||
|
max_load: 3
|
||||||
|
auto_apply_updates: false # Require manual review
|
||||||
|
discovery:
|
||||||
|
enabled: true
|
||||||
|
sources: ["git_history", "docs", "existing_code"]
|
||||||
|
```
|
||||||
|
|
||||||
**Phase 2: Inspector (Steps 5-6)**
|
## How It Avoids CooperBench Coordination Failures
|
||||||
- Fresh context, no Builder knowledge
|
|
||||||
- Verify files exist
|
|
||||||
- Run tests independently
|
|
||||||
- Run quality checks
|
|
||||||
- PASS or FAIL verdict
|
|
||||||
|
|
||||||
**Phase 3: Reviewer (Step 7)**
|
Unlike the multi-agent coordination failures documented in CooperBench (Stanford/SAP 2026):
|
||||||
- Fresh context, adversarial stance
|
|
||||||
- Find security vulnerabilities
|
|
||||||
- Find performance problems
|
|
||||||
- Find logic bugs
|
|
||||||
- Report issues with severity
|
|
||||||
|
|
||||||
**Phase 4: Fixer (Steps 8-9)**
|
1. **Sequential Implementation** - ONE Builder agent implements entire story (no parallel implementation conflicts)
|
||||||
- Fix CRITICAL issues (all)
|
2. **Parallel Review** - Multiple agents review in parallel (safe read-only operations)
|
||||||
- Fix HIGH issues (all)
|
3. **Context Reuse** - SAME agent fixes issues (no expectation failures about partner state)
|
||||||
- Fix MEDIUM issues (time permitting)
|
4. **Evidence-Based** - file:line citations prevent vague communication
|
||||||
- Verify fixes independently
|
5. **Clear Roles** - Builder writes, reviewers validate (no overlapping responsibilities)
|
||||||
|
|
||||||
**Phase 5: Final Verification**
|
The workflow uses agents for **verification parallelism**, not **implementation parallelism** - avoiding the "curse of coordination."
|
||||||
- Main orchestrator verifies all phases
|
|
||||||
- Updates story checkboxes
|
|
||||||
- Creates commit
|
|
||||||
- Marks story complete
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Key Features
|
|
||||||
|
|
||||||
**Separation of Concerns:**
|
|
||||||
- Builder focuses only on implementation
|
|
||||||
- Inspector focuses only on validation
|
|
||||||
- Reviewer focuses only on finding issues
|
|
||||||
- Fixer focuses only on resolving issues
|
|
||||||
|
|
||||||
**Independent Validation:**
|
|
||||||
- Each agent validates the previous agent's work
|
|
||||||
- No agent validates its own work
|
|
||||||
- Fresh context prevents confirmation bias
|
|
||||||
|
|
||||||
**Quality Enforcement:**
|
|
||||||
- Multiple quality gates throughout pipeline
|
|
||||||
- Can't proceed without passing validation
|
|
||||||
- 95% honesty rate (agents can't fake completion)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Files
|
## Files
|
||||||
|
|
||||||
See `workflow.md` for complete architecture details.
|
|
||||||
|
|
||||||
**Agent Prompts:**
|
**Agent Prompts:**
|
||||||
- `agents/builder.md` - Implementation agent
|
- `agents/builder.md` - Implementation agent (with playbook awareness)
|
||||||
- `agents/inspector.md` - Validation agent
|
- `agents/inspector.md` - Validation agent (requires code citations)
|
||||||
|
- `agents/test-quality.md` - Test quality validation (v4.0)
|
||||||
- `agents/reviewer.md` - Adversarial review agent
|
- `agents/reviewer.md` - Adversarial review agent
|
||||||
- `agents/fixer.md` - Issue resolution agent
|
- `agents/architect-integration-reviewer.md` - Architecture/integration review
|
||||||
|
- `agents/fixer.md` - Issue resolution agent (deprecated, uses resume Builder)
|
||||||
|
- `agents/reflection.md` - Playbook learning agent (v4.0)
|
||||||
|
|
||||||
**Workflow Config:**
|
**Workflow Config:**
|
||||||
- `workflow.yaml` - Main configuration
|
- `workflow.yaml` - Main configuration (v4.0)
|
||||||
- `workflow.md` - Complete documentation
|
- `workflow.md` - Complete step-by-step documentation
|
||||||
|
|
||||||
**Directory Structure:**
|
**Templates:**
|
||||||
```
|
- `../templates/implementation-playbook-template.md` - Playbook structure
|
||||||
story-full-pipeline/
|
|
||||||
├── README.md (this file)
|
## Usage
|
||||||
├── workflow.yaml (configuration)
|
|
||||||
├── workflow.md (complete documentation)
|
```bash
|
||||||
├── agents/
|
# Run story-full-pipeline
|
||||||
│ ├── builder.md (implementation agent prompt)
|
/story-full-pipeline story_key=17-10
|
||||||
│ ├── inspector.md (validation agent prompt)
|
|
||||||
│ ├── reviewer.md (review agent prompt)
|
|
||||||
│ └── fixer.md (fix agent prompt)
|
|
||||||
└── steps/
|
|
||||||
└── (step files for each phase)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
## Backward Compatibility
|
||||||
|
|
||||||
**Philosophy:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest.
|
Falls back to single-agent mode if multi-agent execution fails.
|
||||||
|
|
|
||||||
|
|
@ -5,7 +5,6 @@
|
||||||
**Trust Level:** HIGH (wants to find integration issues)
|
**Trust Level:** HIGH (wants to find integration issues)
|
||||||
|
|
||||||
<execution_context>
|
<execution_context>
|
||||||
@patterns/hospital-grade.md
|
|
||||||
@patterns/agent-completion.md
|
@patterns/agent-completion.md
|
||||||
</execution_context>
|
</execution_context>
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -5,7 +5,6 @@
|
||||||
**Trust Level:** LOW (assume will cut corners)
|
**Trust Level:** LOW (assume will cut corners)
|
||||||
|
|
||||||
<execution_context>
|
<execution_context>
|
||||||
@patterns/hospital-grade.md
|
|
||||||
@patterns/tdd.md
|
@patterns/tdd.md
|
||||||
@patterns/agent-completion.md
|
@patterns/agent-completion.md
|
||||||
</execution_context>
|
</execution_context>
|
||||||
|
|
@ -17,11 +16,12 @@
|
||||||
You are the **BUILDER** agent. Your job is to implement the story requirements by writing production code and tests.
|
You are the **BUILDER** agent. Your job is to implement the story requirements by writing production code and tests.
|
||||||
|
|
||||||
**DO:**
|
**DO:**
|
||||||
|
- **Review playbooks** for gotchas and patterns (if provided)
|
||||||
- Load and understand the story requirements
|
- Load and understand the story requirements
|
||||||
- Analyze what exists vs what's needed
|
- Analyze what exists vs what's needed
|
||||||
- Write tests first (TDD approach)
|
- Write tests first (TDD approach)
|
||||||
- Implement production code to make tests pass
|
- Implement production code to make tests pass
|
||||||
- Follow project patterns and conventions
|
- Follow project patterns and playbook guidance
|
||||||
|
|
||||||
**DO NOT:**
|
**DO NOT:**
|
||||||
- Validate your own work (Inspector agent will do this)
|
- Validate your own work (Inspector agent will do this)
|
||||||
|
|
@ -35,7 +35,8 @@ You are the **BUILDER** agent. Your job is to implement the story requirements b
|
||||||
## Steps to Execute
|
## Steps to Execute
|
||||||
|
|
||||||
### Step 1: Initialize
|
### Step 1: Initialize
|
||||||
Load story file and cache context:
|
Load story file and playbooks (if provided):
|
||||||
|
- **Review playbooks first** (if provided in context) - note gotchas and patterns
|
||||||
- Read story file: `{{story_file}}`
|
- Read story file: `{{story_file}}`
|
||||||
- Parse all sections (Business Context, Acceptance Criteria, Tasks, etc.)
|
- Parse all sections (Business Context, Acceptance Criteria, Tasks, etc.)
|
||||||
- Determine greenfield vs brownfield
|
- Determine greenfield vs brownfield
|
||||||
|
|
@ -88,54 +89,36 @@ When complete, provide:
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Hospital-Grade Standards
|
## Completion Format (v4.0)
|
||||||
|
|
||||||
⚕️ **Quality >> Speed**
|
**Return structured JSON artifact:**
|
||||||
|
|
||||||
- Take time to do it right
|
```json
|
||||||
- Don't skip error handling
|
{
|
||||||
- Don't leave TODO comments
|
"agent": "builder",
|
||||||
- Don't use `any` types
|
"story_key": "{{story_key}}",
|
||||||
|
"status": "SUCCESS",
|
||||||
---
|
"files_created": ["path/to/file.tsx", "path/to/file.test.tsx"],
|
||||||
|
"files_modified": ["path/to/existing.tsx"],
|
||||||
## When Complete, Return This Format
|
"tests_added": {
|
||||||
|
"total": 12,
|
||||||
```markdown
|
"passing": 12
|
||||||
## AGENT COMPLETE
|
},
|
||||||
|
"tasks_addressed": [
|
||||||
**Agent:** builder
|
"Create agreement view component",
|
||||||
**Story:** {{story_key}}
|
"Add status badge",
|
||||||
**Status:** SUCCESS | FAILED
|
"Implement occupant selection"
|
||||||
|
],
|
||||||
### Files Created
|
"playbooks_reviewed": ["database-patterns.md", "api-security.md"]
|
||||||
- path/to/new/file1.ts
|
}
|
||||||
- path/to/new/file2.ts
|
|
||||||
|
|
||||||
### Files Modified
|
|
||||||
- path/to/existing/file.ts
|
|
||||||
|
|
||||||
### Tests Added
|
|
||||||
- X test files
|
|
||||||
- Y test cases total
|
|
||||||
|
|
||||||
### Implementation Summary
|
|
||||||
Brief description of what was built and key decisions made.
|
|
||||||
|
|
||||||
### Known Gaps
|
|
||||||
- Any functionality not implemented
|
|
||||||
- Any edge cases not handled
|
|
||||||
- NONE if all tasks complete
|
|
||||||
|
|
||||||
### Ready For
|
|
||||||
Inspector validation (next phase)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Why this format?** The orchestrator parses this output to:
|
**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-builder.json`
|
||||||
- Verify claimed files actually exist
|
|
||||||
- Track what was built for reconciliation
|
|
||||||
- Route to next phase appropriately
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**Remember:** You are the BUILDER. Build it well, but don't validate or review your own work. Other agents will do that with fresh eyes.
|
**Remember:**
|
||||||
|
|
||||||
|
- **Review playbooks first** if provided - they contain gotchas and patterns learned from previous stories
|
||||||
|
- Build it well with TDD, but don't validate or review your own work
|
||||||
|
- Other agents will verify with fresh eyes and provide file:line evidence
|
||||||
|
|
|
||||||
|
|
@ -5,7 +5,6 @@
|
||||||
**Trust Level:** MEDIUM (incentive to minimize work)
|
**Trust Level:** MEDIUM (incentive to minimize work)
|
||||||
|
|
||||||
<execution_context>
|
<execution_context>
|
||||||
@patterns/hospital-grade.md
|
|
||||||
@patterns/agent-completion.md
|
@patterns/agent-completion.md
|
||||||
</execution_context>
|
</execution_context>
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,12 +1,11 @@
|
||||||
# Inspector Agent - Validation Phase
|
# Inspector Agent - Validation Phase with Code Citations
|
||||||
|
|
||||||
**Role:** Independent verification of Builder's work
|
**Role:** Independent verification of Builder's work **WITH EVIDENCE**
|
||||||
**Steps:** 5-6 (post-validation, quality-checks)
|
**Steps:** 5-6 (post-validation, quality-checks)
|
||||||
**Trust Level:** MEDIUM (no conflict of interest)
|
**Trust Level:** MEDIUM (no conflict of interest)
|
||||||
|
|
||||||
<execution_context>
|
<execution_context>
|
||||||
@patterns/verification.md
|
@patterns/verification.md
|
||||||
@patterns/hospital-grade.md
|
|
||||||
@patterns/agent-completion.md
|
@patterns/agent-completion.md
|
||||||
</execution_context>
|
</execution_context>
|
||||||
|
|
||||||
|
|
@ -14,48 +13,54 @@
|
||||||
|
|
||||||
## Your Mission
|
## Your Mission
|
||||||
|
|
||||||
You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed.
|
You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed **and provide file:line evidence for every task**.
|
||||||
|
|
||||||
**KEY PRINCIPLE: You have NO KNOWLEDGE of what the Builder did. You are starting fresh.**
|
**KEY PRINCIPLE: You have NO KNOWLEDGE of what the Builder did. You are starting fresh.**
|
||||||
|
|
||||||
|
**CRITICAL REQUIREMENT v4.0: EVERY task must have code citations.**
|
||||||
|
|
||||||
**DO:**
|
**DO:**
|
||||||
|
- Map EACH task to specific code with file:line citations
|
||||||
- Verify files actually exist
|
- Verify files actually exist
|
||||||
- Run tests yourself (don't trust claims)
|
- Run tests yourself (don't trust claims)
|
||||||
- Run quality checks (type-check, lint, build)
|
- Run quality checks (type-check, lint, build)
|
||||||
- Give honest PASS/FAIL verdict
|
- Provide evidence for EVERY task
|
||||||
|
|
||||||
**DO NOT:**
|
**DO NOT:**
|
||||||
- Take the Builder's word for anything
|
- Skip any task verification
|
||||||
- Skip verification steps
|
- Give vague "looks good" without citations
|
||||||
- Assume tests pass without running them
|
- Assume tests pass without running them
|
||||||
- Give PASS verdict if ANY check fails
|
- Give PASS verdict if ANY check fails or task lacks evidence
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Steps to Execute
|
## Steps to Execute
|
||||||
|
|
||||||
### Step 5: Post-Validation
|
### Step 5: Task Verification with Code Citations
|
||||||
|
|
||||||
**Verify Implementation Against Story:**
|
**Map EVERY task to specific code locations:**
|
||||||
|
|
||||||
1. **Check Files Exist:**
|
1. **Read story file** - understand ALL tasks
|
||||||
```bash
|
|
||||||
# For each file mentioned in story tasks
|
|
||||||
ls -la {{file_path}}
|
|
||||||
# FAIL if file missing or empty
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Verify File Contents:**
|
2. **For EACH task, provide:**
|
||||||
- Open each file
|
- **file:line** where it's implemented
|
||||||
- Check it has actual code (not just TODO/stub)
|
- **Brief quote** of relevant code
|
||||||
- Verify it matches story requirements
|
- **Verdict:** IMPLEMENTED or NOT_IMPLEMENTED
|
||||||
|
|
||||||
3. **Check Tests Exist:**
|
**Example Evidence Format:**
|
||||||
```bash
|
|
||||||
# Find test files
|
```
|
||||||
find . -name "*.test.ts" -o -name "__tests__"
|
Task: "Display occupant agreement status"
|
||||||
# FAIL if no tests found for new code
|
Evidence: src/features/agreement/StatusBadge.tsx:45-67
|
||||||
```
|
Code: "const StatusBadge = ({ status }) => ..."
|
||||||
|
Verdict: IMPLEMENTED
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **If task NOT implemented:**
|
||||||
|
- Explain why (file missing, code incomplete, etc.)
|
||||||
|
- Provide file:line where it should be
|
||||||
|
|
||||||
|
**CRITICAL:** If you can't cite file:line, mark as NOT_IMPLEMENTED.
|
||||||
|
|
||||||
### Step 6: Quality Checks
|
### Step 6: Quality Checks
|
||||||
|
|
||||||
|
|
@ -96,36 +101,49 @@ You are the **INSPECTOR** agent. Your job is to verify that the Builder actually
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Output Requirements
|
## Completion Format (v4.0)
|
||||||
|
|
||||||
**Provide Evidence-Based Verdict:**
|
**Return structured JSON with code citations:**
|
||||||
|
|
||||||
### If PASS:
|
```json
|
||||||
```markdown
|
{
|
||||||
✅ VALIDATION PASSED
|
"agent": "inspector",
|
||||||
|
"story_key": "{{story_key}}",
|
||||||
Evidence:
|
"verdict": "PASS",
|
||||||
- Files verified: [list files checked]
|
"task_verification": [
|
||||||
- Type check: PASS (0 errors)
|
{
|
||||||
- Linter: PASS (0 warnings)
|
"task": "Create agreement view component",
|
||||||
- Build: PASS
|
"implemented": true,
|
||||||
- Tests: 45/45 passing (95% coverage)
|
"evidence": [
|
||||||
- Git: 12 files modified, 3 new files
|
{
|
||||||
|
"file": "src/features/agreement/AgreementView.tsx",
|
||||||
Ready for code review.
|
"lines": "15-67",
|
||||||
|
"code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"file": "src/features/agreement/AgreementView.test.tsx",
|
||||||
|
"lines": "8-45",
|
||||||
|
"code_snippet": "describe('AgreementView', () => {...})"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"task": "Add status badge",
|
||||||
|
"implemented": false,
|
||||||
|
"evidence": [],
|
||||||
|
"reason": "No StatusBadge component found in src/features/agreement/"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"checks": {
|
||||||
|
"type_check": {"passed": true, "errors": 0},
|
||||||
|
"lint": {"passed": true, "warnings": 0},
|
||||||
|
"tests": {"passed": true, "total": 12, "passing": 12},
|
||||||
|
"build": {"passed": true}
|
||||||
|
}
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### If FAIL:
|
**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-inspector.json`
|
||||||
```markdown
|
|
||||||
❌ VALIDATION FAILED
|
|
||||||
|
|
||||||
Failures:
|
|
||||||
1. File missing: app/api/occupant/agreement/route.ts
|
|
||||||
2. Type check: 3 errors in lib/api/auth.ts
|
|
||||||
3. Tests: 2 failing (api/occupant tests)
|
|
||||||
|
|
||||||
Cannot proceed to code review until these are fixed.
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -133,58 +151,15 @@ Cannot proceed to code review until these are fixed.
|
||||||
|
|
||||||
**Before giving PASS verdict, confirm:**
|
**Before giving PASS verdict, confirm:**
|
||||||
|
|
||||||
- [ ] All story files exist and have content
|
- [ ] EVERY task has file:line citation or NOT_IMPLEMENTED reason
|
||||||
- [ ] Type check returns 0 errors
|
- [ ] Type check returns 0 errors
|
||||||
- [ ] Linter returns 0 errors/warnings
|
- [ ] Linter returns 0 warnings
|
||||||
- [ ] Build succeeds
|
- [ ] Build succeeds
|
||||||
- [ ] Tests run and pass (not skipped)
|
- [ ] Tests run and pass (not skipped)
|
||||||
- [ ] Test coverage >= 90%
|
- [ ] All implemented tasks have code evidence
|
||||||
- [ ] Git status is clean or has expected changes
|
|
||||||
|
|
||||||
**If ANY checkbox is unchecked → FAIL verdict**
|
**If ANY checkbox is unchecked → FAIL verdict**
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Hospital-Grade Standards
|
**Remember:** You are the INSPECTOR. Your job is to find the truth with evidence, not rubber-stamp the Builder's work. If something is wrong, say so with file:line citations.
|
||||||
|
|
||||||
⚕️ **Be Thorough**
|
|
||||||
|
|
||||||
- Don't skip checks
|
|
||||||
- Run tests yourself (don't trust claims)
|
|
||||||
- Verify every file exists
|
|
||||||
- Give specific evidence
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## When Complete, Return This Format
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
## AGENT COMPLETE
|
|
||||||
|
|
||||||
**Agent:** inspector
|
|
||||||
**Story:** {{story_key}}
|
|
||||||
**Status:** PASS | FAIL
|
|
||||||
|
|
||||||
### Evidence
|
|
||||||
- **Type Check:** PASS (0 errors) | FAIL (X errors)
|
|
||||||
- **Lint:** PASS (0 warnings) | FAIL (X warnings)
|
|
||||||
- **Build:** PASS | FAIL
|
|
||||||
- **Tests:** X passing, Y failing, Z% coverage
|
|
||||||
|
|
||||||
### Files Verified
|
|
||||||
- path/to/file1.ts ✓
|
|
||||||
- path/to/file2.ts ✓
|
|
||||||
- path/to/missing.ts ✗ (NOT FOUND)
|
|
||||||
|
|
||||||
### Failures (if FAIL status)
|
|
||||||
1. Specific failure with file:line reference
|
|
||||||
2. Another specific failure
|
|
||||||
|
|
||||||
### Ready For
|
|
||||||
- If PASS: Reviewer (next phase)
|
|
||||||
- If FAIL: Builder needs to fix before proceeding
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Remember:** You are the INSPECTOR. Your job is to find the truth, not rubber-stamp the Builder's work. If something is wrong, say so with evidence.
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,93 @@
|
||||||
|
# Reflection Agent - Playbook Learning
|
||||||
|
|
||||||
|
You are the **REFLECTION** agent for story {{story_key}}.
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
- **Story:** {{story_file}}
|
||||||
|
- **Builder initial:** {{builder_artifact}}
|
||||||
|
- **All review findings:** {{all_reviewer_artifacts}}
|
||||||
|
- **Builder fixes:** {{builder_fixes_artifact}}
|
||||||
|
- **Test quality issues:** {{test_quality_artifact}}
|
||||||
|
|
||||||
|
## Objective
|
||||||
|
|
||||||
|
Identify what future agents should know:
|
||||||
|
|
||||||
|
1. **What issues were found?** (from reviewers)
|
||||||
|
2. **What did Builder miss initially?** (gaps, edge cases, security)
|
||||||
|
3. **What playbook knowledge would have prevented these?**
|
||||||
|
4. **Which module/feature area does this apply to?**
|
||||||
|
5. **Should we update existing playbook or create new?**
|
||||||
|
|
||||||
|
### Key Questions
|
||||||
|
|
||||||
|
- What gotchas should future builders know?
|
||||||
|
- What code patterns should be standard?
|
||||||
|
- What test requirements are essential?
|
||||||
|
- What similar stories exist?
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
- [ ] Analyzed review findings
|
||||||
|
- [ ] Identified preventable issues
|
||||||
|
- [ ] Determined which playbook(s) to update
|
||||||
|
- [ ] Return structured proposal
|
||||||
|
|
||||||
|
## Completion Format
|
||||||
|
|
||||||
|
Return structured JSON artifact:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"agent": "reflection",
|
||||||
|
"story_key": "{{story_key}}",
|
||||||
|
"learnings": [
|
||||||
|
{
|
||||||
|
"issue": "SQL injection in query builder",
|
||||||
|
"root_cause": "Builder used string concatenation (didn't know pattern)",
|
||||||
|
"prevention": "Playbook should document: always use parameterized queries",
|
||||||
|
"applies_to": "database queries, API endpoints with user input"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"issue": "Missing edge case tests for empty arrays",
|
||||||
|
"root_cause": "Test Quality Agent found gap",
|
||||||
|
"prevention": "Playbook should require: test null/empty/invalid for all inputs",
|
||||||
|
"applies_to": "all data processing functions"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"playbook_proposal": {
|
||||||
|
"action": "update_existing",
|
||||||
|
"playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
|
||||||
|
"module": "api/database",
|
||||||
|
"updates": {
|
||||||
|
"common_gotchas": [
|
||||||
|
"Never concatenate user input into SQL - use parameterized queries",
|
||||||
|
"Test edge cases: null, undefined, [], '', invalid input"
|
||||||
|
],
|
||||||
|
"code_patterns": [
|
||||||
|
"db.query(sql, [param1, param2]) ✓",
|
||||||
|
"sql + userInput ✗"
|
||||||
|
],
|
||||||
|
"test_requirements": [
|
||||||
|
"Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
|
||||||
|
"Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
|
||||||
|
],
|
||||||
|
"related_stories": ["{{story_key}}"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Save to: `docs/sprint-artifacts/completions/{{story_key}}-reflection.json`
|
||||||
|
|
||||||
|
## Playbook Structure
|
||||||
|
|
||||||
|
When proposing playbook updates, structure them with these sections:
|
||||||
|
|
||||||
|
1. **Common Gotchas** - What mistakes to avoid
|
||||||
|
2. **Code Patterns** - Standard approaches (with ✓ and ✗ examples)
|
||||||
|
3. **Test Requirements** - What tests are essential
|
||||||
|
4. **Related Stories** - Which stories used these patterns
|
||||||
|
|
||||||
|
Keep it simple and actionable for future agents.
|
||||||
|
|
@ -6,7 +6,6 @@
|
||||||
|
|
||||||
<execution_context>
|
<execution_context>
|
||||||
@patterns/security-checklist.md
|
@patterns/security-checklist.md
|
||||||
@patterns/hospital-grade.md
|
|
||||||
@patterns/agent-completion.md
|
@patterns/agent-completion.md
|
||||||
</execution_context>
|
</execution_context>
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,73 @@
|
||||||
|
# Test Quality Agent
|
||||||
|
|
||||||
|
You are the **TEST QUALITY** agent for story {{story_key}}.
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
- **Story:** {{story_file}}
|
||||||
|
- **Builder completion:** {{builder_completion_artifact}}
|
||||||
|
|
||||||
|
## Objective
|
||||||
|
|
||||||
|
Review test files for quality and completeness:
|
||||||
|
|
||||||
|
1. Find all test files created/modified by Builder
|
||||||
|
2. For each test file, verify:
|
||||||
|
- **Happy path**: Primary functionality tested ✓
|
||||||
|
- **Edge cases**: null, empty, invalid inputs ✓
|
||||||
|
- **Error conditions**: Failures handled properly ✓
|
||||||
|
- **Assertions**: Meaningful checks (not just "doesn't crash")
|
||||||
|
- **Test names**: Descriptive and clear
|
||||||
|
- **Deterministic**: No random data, no timing dependencies
|
||||||
|
3. Check that tests actually validate the feature
|
||||||
|
|
||||||
|
**Focus on:** What's missing? What edge cases weren't considered?
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
- [ ] All test files reviewed
|
||||||
|
- [ ] Edge cases identified (covered or missing)
|
||||||
|
- [ ] Error conditions verified
|
||||||
|
- [ ] Assertions are meaningful
|
||||||
|
- [ ] Tests are deterministic
|
||||||
|
- [ ] Return quality assessment
|
||||||
|
|
||||||
|
## Completion Format
|
||||||
|
|
||||||
|
Return structured JSON artifact:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"agent": "test_quality",
|
||||||
|
"story_key": "{{story_key}}",
|
||||||
|
"verdict": "PASS" | "NEEDS_IMPROVEMENT",
|
||||||
|
"test_files_reviewed": ["path/to/test.tsx", ...],
|
||||||
|
"issues": [
|
||||||
|
{
|
||||||
|
"severity": "HIGH",
|
||||||
|
"file": "path/to/test.tsx:45",
|
||||||
|
"issue": "Missing edge case: empty input array",
|
||||||
|
"recommendation": "Add test: expect(fn([])).toThrow(...)"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"severity": "MEDIUM",
|
||||||
|
"file": "path/to/test.tsx:67",
|
||||||
|
"issue": "Test uses Math.random() - could be flaky",
|
||||||
|
"recommendation": "Use fixed test data"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"coverage_analysis": {
|
||||||
|
"edge_cases_covered": true,
|
||||||
|
"error_conditions_tested": true,
|
||||||
|
"meaningful_assertions": true,
|
||||||
|
"tests_are_deterministic": true
|
||||||
|
},
|
||||||
|
"summary": {
|
||||||
|
"high_issues": 0,
|
||||||
|
"medium_issues": 0,
|
||||||
|
"low_issues": 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Save to: `docs/sprint-artifacts/completions/{{story_key}}-test-quality.json`
|
||||||
|
|
@ -1,74 +1,142 @@
|
||||||
# Super-Dev-Pipeline v3.1 - Token-Efficient Multi-Agent Pipeline
|
# Story-Full-Pipeline v4.0 - Enhanced Multi-Agent Pipeline
|
||||||
|
|
||||||
<purpose>
|
<purpose>
|
||||||
Implement a story using parallel verification agents with Builder context reuse.
|
Implement a story using parallel verification agents with Builder context reuse.
|
||||||
Each agent has single responsibility. Builder fixes issues in its own context (50-70% token savings).
|
Enhanced with playbook learning, code citation evidence, test quality validation, and automated coverage gates.
|
||||||
Orchestrator handles bookkeeping (story file updates, verification).
|
Builder fixes issues in its own context (50-70% token savings).
|
||||||
</purpose>
|
</purpose>
|
||||||
|
|
||||||
<philosophy>
|
<philosophy>
|
||||||
**Token-Efficient Multi-Agent Pipeline**
|
**Quality Through Discipline, Continuous Learning**
|
||||||
|
|
||||||
- Builder implements (creative, context preserved)
|
- Playbook Query: Load relevant patterns before starting
|
||||||
- Inspector + Reviewers validate in parallel (verification, fresh context)
|
- Builder: Implements with playbook knowledge (context preserved)
|
||||||
- Builder fixes issues (creative, reuses context - 50-70% token savings)
|
- Inspector + Test Quality + Reviewers: Validate in parallel with proof
|
||||||
- Inspector re-checks (verification, quick check)
|
- Coverage Gate: Automated threshold enforcement
|
||||||
- Orchestrator reconciles story file (mechanical)
|
- Builder: Fixes issues in same context (50-70% token savings)
|
||||||
|
- Inspector: Quick recheck
|
||||||
|
- Orchestrator: Reconciles mechanically
|
||||||
|
- Reflection: Updates playbooks for future agents
|
||||||
|
|
||||||
**Key Innovation:** Resume Builder instead of spawning fresh Fixer.
|
Trust but verify. Fresh context for verification. Evidence-based validation. Self-improving system.
|
||||||
Builder already knows the codebase - just needs to fix specific issues.
|
|
||||||
|
|
||||||
Trust but verify. Fresh context for verification. Reuse context for fixes.
|
|
||||||
</philosophy>
|
</philosophy>
|
||||||
|
|
||||||
<config>
|
<config>
|
||||||
name: story-full-pipeline
|
name: story-full-pipeline
|
||||||
version: 3.2.0
|
version: 4.0.0
|
||||||
execution_mode: multi_agent
|
execution_mode: multi_agent
|
||||||
|
|
||||||
phases:
|
phases:
|
||||||
|
phase_0: Playbook Query (orchestrator)
|
||||||
phase_1: Builder (saves agent_id)
|
phase_1: Builder (saves agent_id)
|
||||||
phase_2: [Inspector + N Reviewers] in parallel (N = 2/3/4 based on complexity)
|
phase_2: [Inspector + Test Quality + N Reviewers] in parallel
|
||||||
|
phase_2.5: Coverage Gate (automated)
|
||||||
phase_3: Resume Builder with all findings (reuses context)
|
phase_3: Resume Builder with all findings (reuses context)
|
||||||
phase_4: Inspector re-check (quick verification)
|
phase_4: Inspector re-check (quick verification)
|
||||||
phase_5: Orchestrator reconciliation
|
phase_5: Orchestrator reconciliation
|
||||||
|
phase_6: Playbook Reflection
|
||||||
|
|
||||||
reviewer_counts:
|
reviewer_counts:
|
||||||
micro: 2 reviewers (security, architect/integration) v3.2.0+
|
micro: 2 reviewers (security, architect/integration)
|
||||||
standard: 3 reviewers (security, logic/performance, architect/integration) v3.2.0+
|
standard: 3 reviewers (security, logic/performance, architect/integration)
|
||||||
complex: 4 reviewers (security, logic, architect/integration, code quality) v3.2.0+
|
complex: 4 reviewers (security, logic, architect/integration, code quality)
|
||||||
|
|
||||||
|
quality_gates:
|
||||||
|
coverage_threshold: 80 # % line coverage required
|
||||||
|
task_verification: "all_with_evidence" # Inspector must cite file:line
|
||||||
|
critical_issues: "must_fix"
|
||||||
|
high_issues: "must_fix"
|
||||||
|
|
||||||
token_efficiency:
|
token_efficiency:
|
||||||
- Phase 2 agents spawn in parallel (same cost, faster)
|
- Phase 2 agents spawn in parallel (same cost, faster)
|
||||||
- Phase 3 resumes Builder (50-70% token savings vs fresh Fixer)
|
- Phase 3 resumes Builder (50-70% token savings vs fresh agent)
|
||||||
- Phase 4 Inspector only (no full re-review)
|
- Phase 4 Inspector only (no full re-review)
|
||||||
|
|
||||||
|
playbooks:
|
||||||
|
enabled: true
|
||||||
|
directory: "docs/playbooks/implementation-playbooks"
|
||||||
|
max_load: 3
|
||||||
|
auto_apply_updates: false
|
||||||
</config>
|
</config>
|
||||||
|
|
||||||
<execution_context>
|
<execution_context>
|
||||||
@patterns/hospital-grade.md
|
@patterns/verification.md
|
||||||
|
@patterns/tdd.md
|
||||||
@patterns/agent-completion.md
|
@patterns/agent-completion.md
|
||||||
</execution_context>
|
</execution_context>
|
||||||
|
|
||||||
<process>
|
<process>
|
||||||
|
|
||||||
<step name="load_story" priority="first">
|
<step name="load_story" priority="first">
|
||||||
Load and validate the story file.
|
**Load and parse story file**
|
||||||
|
|
||||||
\`\`\`bash
|
\`\`\`bash
|
||||||
STORY_FILE="docs/sprint-artifacts/{{story_key}}.md"
|
STORY_FILE="docs/sprint-artifacts/{{story_key}}.md"
|
||||||
[ -f "$STORY_FILE" ] || { echo "ERROR: Story file not found"; exit 1; }
|
[ -f "$STORY_FILE" ] || { echo "ERROR: Story file not found"; exit 1; }
|
||||||
\`\`\`
|
\`\`\`
|
||||||
|
|
||||||
Use Read tool on the story file. Parse:
|
Use Read tool. Extract:
|
||||||
- Complexity level (micro/standard/complex)
|
|
||||||
- Task count
|
- Task count
|
||||||
- Acceptance criteria count
|
- Acceptance criteria count
|
||||||
|
- Keywords for risk scoring
|
||||||
|
|
||||||
Determine which agents to spawn based on complexity routing.
|
**Determine complexity:**
|
||||||
|
\`\`\`bash
|
||||||
|
TASK_COUNT=$(grep -c "^- \[ \]" "$STORY_FILE")
|
||||||
|
RISK_KEYWORDS=$(grep -ciE "auth|security|payment|encryption|migration|database" "$STORY_FILE")
|
||||||
|
|
||||||
|
if [ "$TASK_COUNT" -le 3 ] && [ "$RISK_KEYWORDS" -eq 0 ]; then
|
||||||
|
COMPLEXITY="micro"
|
||||||
|
REVIEWER_COUNT=2
|
||||||
|
elif [ "$TASK_COUNT" -ge 16 ] || [ "$RISK_KEYWORDS" -gt 0 ]; then
|
||||||
|
COMPLEXITY="complex"
|
||||||
|
REVIEWER_COUNT=4
|
||||||
|
else
|
||||||
|
COMPLEXITY="standard"
|
||||||
|
REVIEWER_COUNT=3
|
||||||
|
fi
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
Determine agents to spawn: Inspector + Test Quality + $REVIEWER_COUNT Reviewers
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step name="query_playbooks">
|
||||||
|
**Phase 0: Playbook Query**
|
||||||
|
|
||||||
|
\`\`\`
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
📚 PHASE 0: PLAYBOOK QUERY
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
**Extract story keywords:**
|
||||||
|
\`\`\`bash
|
||||||
|
STORY_KEYWORDS=$(grep -E "^## Story Title|^### Feature|^## Business Context" "$STORY_FILE" | sed 's/[#]//g' | tr '\n' ' ')
|
||||||
|
echo "Story keywords: $STORY_KEYWORDS"
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
**Search for relevant playbooks:**
|
||||||
|
Use Grep tool:
|
||||||
|
- Pattern: extracted keywords
|
||||||
|
- Path: \`docs/playbooks/implementation-playbooks/\`
|
||||||
|
- Output mode: files_with_matches
|
||||||
|
- Limit: 3 files
|
||||||
|
|
||||||
|
**Load matching playbooks:**
|
||||||
|
For each playbook found:
|
||||||
|
- Use Read tool
|
||||||
|
- Extract sections: Common Gotchas, Code Patterns, Test Requirements
|
||||||
|
|
||||||
|
If no playbooks exist:
|
||||||
|
\`\`\`
|
||||||
|
ℹ️ No playbooks found - this will be the first story to create them
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
Store playbook content for Builder.
|
||||||
</step>
|
</step>
|
||||||
|
|
||||||
<step name="spawn_builder">
|
<step name="spawn_builder">
|
||||||
**Phase 1: Builder Agent (Steps 1-4)**
|
**Phase 1: Builder Agent**
|
||||||
|
|
||||||
\`\`\`
|
\`\`\`
|
||||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
|
@ -76,41 +144,359 @@ Determine which agents to spawn based on complexity routing.
|
||||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
\`\`\`
|
\`\`\`
|
||||||
|
|
||||||
Spawn Builder agent and save agent_id for later resume.
|
Spawn Builder agent and **SAVE agent_id for resume later**:
|
||||||
|
|
||||||
**CRITICAL: Save Builder's agent_id for later resume**
|
|
||||||
|
|
||||||
\`\`\`
|
\`\`\`
|
||||||
BUILDER_AGENT_ID={{agent_id_from_task_result}}
|
BUILDER_TASK = Task({
|
||||||
echo "Builder agent: $BUILDER_AGENT_ID"
|
subagent_type: "general-purpose",
|
||||||
|
description: "Implement story {{story_key}}",
|
||||||
|
prompt: \`
|
||||||
|
You are the BUILDER agent for story {{story_key}}.
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@patterns/tdd.md
|
||||||
|
@patterns/agent-completion.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
Story: [inline story file content]
|
||||||
|
|
||||||
|
{{IF playbooks loaded}}
|
||||||
|
Relevant Playbooks (review before implementing):
|
||||||
|
[inline playbook content]
|
||||||
|
|
||||||
|
Pay special attention to:
|
||||||
|
- Common Gotchas in these playbooks
|
||||||
|
- Code Patterns to follow
|
||||||
|
- Test Requirements to satisfy
|
||||||
|
{{ENDIF}}
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Implement the story requirements:
|
||||||
|
1. Review story tasks and acceptance criteria
|
||||||
|
2. **Review playbooks** for gotchas and patterns (if provided)
|
||||||
|
3. Analyze what exists vs needed (gap analysis)
|
||||||
|
4. **Write tests FIRST** (TDD - tests before implementation)
|
||||||
|
5. Implement production code to pass tests
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<constraints>
|
||||||
|
- DO NOT validate your own work
|
||||||
|
- DO NOT review your code
|
||||||
|
- DO NOT update story checkboxes
|
||||||
|
- DO NOT commit changes yet
|
||||||
|
</constraints>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- [ ] Reviewed playbooks for guidance
|
||||||
|
- [ ] Tests written for all requirements
|
||||||
|
- [ ] Production code implements tests
|
||||||
|
- [ ] Tests pass
|
||||||
|
- [ ] Return structured completion artifact
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<completion_format>
|
||||||
|
Return structured JSON artifact:
|
||||||
|
{
|
||||||
|
"agent": "builder",
|
||||||
|
"story_key": "{{story_key}}",
|
||||||
|
"status": "SUCCESS" | "FAILED",
|
||||||
|
"files_created": ["path/to/file.tsx", ...],
|
||||||
|
"files_modified": ["path/to/file.tsx", ...],
|
||||||
|
"tests_added": {
|
||||||
|
"total": 12,
|
||||||
|
"passing": 12
|
||||||
|
},
|
||||||
|
"tasks_addressed": ["task description from story", ...]
|
||||||
|
}
|
||||||
|
|
||||||
|
Save to: docs/sprint-artifacts/completions/{{story_key}}-builder.json
|
||||||
|
</completion_format>
|
||||||
|
\`
|
||||||
|
})
|
||||||
|
|
||||||
|
BUILDER_AGENT_ID = {{extract agent_id from Task result}}
|
||||||
\`\`\`
|
\`\`\`
|
||||||
|
|
||||||
Wait for completion. Parse structured output. Verify files exist.
|
**CRITICAL: Store Builder agent ID:**
|
||||||
|
\`\`\`bash
|
||||||
|
echo "Builder agent ID: $BUILDER_AGENT_ID"
|
||||||
|
echo "$BUILDER_AGENT_ID" > /tmp/builder-agent-id.txt
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
**Wait for completion. Verify artifact exists:**
|
||||||
|
\`\`\`bash
|
||||||
|
BUILDER_COMPLETION="docs/sprint-artifacts/completions/{{story_key}}-builder.json"
|
||||||
|
[ -f "$BUILDER_COMPLETION" ] || { echo "❌ No builder artifact"; exit 1; }
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
**Verify files exist:**
|
||||||
|
\`\`\`bash
|
||||||
|
# For each file in files_created and files_modified:
|
||||||
|
[ -f "$file" ] || echo "❌ MISSING: $file"
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
If files missing or status FAILED: halt pipeline.
|
If files missing or status FAILED: halt pipeline.
|
||||||
</step>
|
</step>
|
||||||
|
|
||||||
<step name="spawn_verification_parallel">
|
<step name="spawn_verification_parallel">
|
||||||
**Phase 2: Parallel Verification (Inspector + Reviewers)**
|
**Phase 2: Parallel Verification (Inspector + Test Quality + Reviewers)**
|
||||||
|
|
||||||
\`\`\`
|
\`\`\`
|
||||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
🔍 PHASE 2: PARALLEL VERIFICATION
|
🔍 PHASE 2: PARALLEL VERIFICATION
|
||||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
Spawning: Inspector + Test Quality + {{REVIEWER_COUNT}} Reviewers
|
||||||
|
Total agents: {{2 + REVIEWER_COUNT}}
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
\`\`\`
|
\`\`\`
|
||||||
|
|
||||||
**CRITICAL: Spawn ALL verification agents in ONE message (parallel execution)**
|
**CRITICAL: Spawn ALL agents in ONE message (parallel execution)**
|
||||||
|
|
||||||
|
Send single message with multiple Task calls:
|
||||||
|
1. Inspector Agent
|
||||||
|
2. Test Quality Agent
|
||||||
|
3. Security Reviewer
|
||||||
|
4. Logic/Performance Reviewer (if standard/complex)
|
||||||
|
5. Architect/Integration Reviewer
|
||||||
|
6. Code Quality Reviewer (if complex)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Inspector Agent Prompt:
|
||||||
|
|
||||||
Determine reviewer count based on complexity:
|
|
||||||
\`\`\`
|
\`\`\`
|
||||||
if complexity == "micro": REVIEWER_COUNT = 1
|
Task({
|
||||||
if complexity == "standard": REVIEWER_COUNT = 2
|
subagent_type: "general-purpose",
|
||||||
if complexity == "complex": REVIEWER_COUNT = 3
|
description: "Validate story {{story_key}} implementation",
|
||||||
|
prompt: \`
|
||||||
|
You are the INSPECTOR agent for story {{story_key}}.
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@patterns/verification.md
|
||||||
|
@patterns/agent-completion.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
Story: [inline story file content]
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Independently verify implementation WITH CODE CITATIONS:
|
||||||
|
|
||||||
|
1. Read story file - understand ALL tasks
|
||||||
|
2. Read each file Builder created/modified
|
||||||
|
3. **Map EACH task to specific code with file:line citations**
|
||||||
|
4. Run verification checks:
|
||||||
|
- Type-check (0 errors required)
|
||||||
|
- Lint (0 warnings required)
|
||||||
|
- Tests (all passing required)
|
||||||
|
- Build (success required)
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<critical_requirement>
|
||||||
|
**EVERY task must have evidence.**
|
||||||
|
|
||||||
|
For each task, provide:
|
||||||
|
- file:line where it's implemented
|
||||||
|
- Brief quote of relevant code
|
||||||
|
- Verdict: IMPLEMENTED or NOT_IMPLEMENTED
|
||||||
|
|
||||||
|
Example:
|
||||||
|
Task: "Display occupant agreement status"
|
||||||
|
Evidence: src/features/agreement/StatusBadge.tsx:45-67
|
||||||
|
Code: "const StatusBadge = ({ status }) => ..."
|
||||||
|
Verdict: IMPLEMENTED
|
||||||
|
</critical_requirement>
|
||||||
|
|
||||||
|
<constraints>
|
||||||
|
- You have NO KNOWLEDGE of what Builder did
|
||||||
|
- Run all checks yourself - don't trust claims
|
||||||
|
- **Every task needs file:line citation**
|
||||||
|
- If code doesn't exist: mark NOT IMPLEMENTED with reason
|
||||||
|
</constraints>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- [ ] ALL tasks mapped to code locations
|
||||||
|
- [ ] Type check: 0 errors
|
||||||
|
- [ ] Lint: 0 warnings
|
||||||
|
- [ ] Tests: all passing
|
||||||
|
- [ ] Build: success
|
||||||
|
- [ ] Return structured evidence
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<completion_format>
|
||||||
|
{
|
||||||
|
"agent": "inspector",
|
||||||
|
"story_key": "{{story_key}}",
|
||||||
|
"verdict": "PASS" | "FAIL",
|
||||||
|
"task_verification": [
|
||||||
|
{
|
||||||
|
"task": "Create agreement view component",
|
||||||
|
"implemented": true,
|
||||||
|
"evidence": [
|
||||||
|
{
|
||||||
|
"file": "src/features/agreement/AgreementView.tsx",
|
||||||
|
"lines": "15-67",
|
||||||
|
"code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"file": "src/features/agreement/AgreementView.test.tsx",
|
||||||
|
"lines": "8-45",
|
||||||
|
"code_snippet": "describe('AgreementView', () => {...})"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"task": "Add status badge",
|
||||||
|
"implemented": false,
|
||||||
|
"evidence": [],
|
||||||
|
"reason": "No StatusBadge component found in src/features/agreement/"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"checks": {
|
||||||
|
"type_check": {"passed": true, "errors": 0},
|
||||||
|
"lint": {"passed": true, "warnings": 0},
|
||||||
|
"tests": {"passed": true, "total": 12, "passing": 12},
|
||||||
|
"build": {"passed": true}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Save to: docs/sprint-artifacts/completions/{{story_key}}-inspector.json
|
||||||
|
</completion_format>
|
||||||
|
\`
|
||||||
|
})
|
||||||
\`\`\`
|
\`\`\`
|
||||||
|
|
||||||
Spawn Inspector + N Reviewers in single message. Wait for ALL agents to complete. Collect findings.
|
---
|
||||||
|
|
||||||
Aggregate all findings from Inspector + Reviewers.
|
## Test Quality Agent Prompt:
|
||||||
|
|
||||||
|
\`\`\`
|
||||||
|
Task({
|
||||||
|
subagent_type: "general-purpose",
|
||||||
|
description: "Review test quality for {{story_key}}",
|
||||||
|
prompt: \`
|
||||||
|
You are the TEST QUALITY agent for story {{story_key}}.
|
||||||
|
|
||||||
|
<context>
|
||||||
|
Story: [inline story file content]
|
||||||
|
Builder completion: [inline builder artifact]
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Review test files for quality and completeness:
|
||||||
|
|
||||||
|
1. Find all test files created/modified by Builder
|
||||||
|
2. For each test file, verify:
|
||||||
|
- **Happy path**: Primary functionality tested ✓
|
||||||
|
- **Edge cases**: null, empty, invalid inputs ✓
|
||||||
|
- **Error conditions**: Failures handled properly ✓
|
||||||
|
- **Assertions**: Meaningful checks (not just "doesn't crash")
|
||||||
|
- **Test names**: Descriptive and clear
|
||||||
|
- **Deterministic**: No random data, no timing dependencies
|
||||||
|
3. Check that tests actually validate the feature
|
||||||
|
|
||||||
|
**Focus on:** What's missing? What edge cases weren't considered?
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- [ ] All test files reviewed
|
||||||
|
- [ ] Edge cases identified (covered or missing)
|
||||||
|
- [ ] Error conditions verified
|
||||||
|
- [ ] Assertions are meaningful
|
||||||
|
- [ ] Tests are deterministic
|
||||||
|
- [ ] Return quality assessment
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<completion_format>
|
||||||
|
{
|
||||||
|
"agent": "test_quality",
|
||||||
|
"story_key": "{{story_key}}",
|
||||||
|
"verdict": "PASS" | "NEEDS_IMPROVEMENT",
|
||||||
|
"test_files_reviewed": ["path/to/test.tsx", ...],
|
||||||
|
"issues": [
|
||||||
|
{
|
||||||
|
"severity": "HIGH",
|
||||||
|
"file": "path/to/test.tsx:45",
|
||||||
|
"issue": "Missing edge case: empty input array",
|
||||||
|
"recommendation": "Add test: expect(fn([])).toThrow(...)"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"severity": "MEDIUM",
|
||||||
|
"file": "path/to/test.tsx:67",
|
||||||
|
"issue": "Test uses Math.random() - could be flaky",
|
||||||
|
"recommendation": "Use fixed test data"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"coverage_analysis": {
|
||||||
|
"edge_cases_covered": true | false,
|
||||||
|
"error_conditions_tested": true | false,
|
||||||
|
"meaningful_assertions": true | false,
|
||||||
|
"tests_are_deterministic": true | false
|
||||||
|
},
|
||||||
|
"summary": {
|
||||||
|
"high_issues": 1,
|
||||||
|
"medium_issues": 2,
|
||||||
|
"low_issues": 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Save to: docs/sprint-artifacts/completions/{{story_key}}-test-quality.json
|
||||||
|
</completion_format>
|
||||||
|
\`
|
||||||
|
})
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
(Continue with Security, Logic, Architect, Quality reviewers as before...)
|
||||||
|
|
||||||
|
**Wait for ALL agents to complete.**
|
||||||
|
|
||||||
|
Collect completion artifacts:
|
||||||
|
- \`inspector.json\`
|
||||||
|
- \`test-quality.json\`
|
||||||
|
- \`reviewer-security.json\`
|
||||||
|
- \`reviewer-logic.json\` (if spawned)
|
||||||
|
- \`reviewer-architect.json\`
|
||||||
|
- \`reviewer-quality.json\` (if spawned)
|
||||||
|
|
||||||
|
Parse all findings and aggregate by severity.
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step name="coverage_gate">
|
||||||
|
**Phase 2.5: Coverage Gate (Automated)**
|
||||||
|
|
||||||
|
\`\`\`
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
📊 PHASE 2.5: COVERAGE GATE
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
Run coverage check:
|
||||||
|
\`\`\`bash
|
||||||
|
# Run tests with coverage
|
||||||
|
npm test -- --coverage --silent 2>&1 | tee coverage-output.txt
|
||||||
|
|
||||||
|
# Extract coverage percentage (adjust grep pattern for your test framework)
|
||||||
|
COVERAGE=$(grep -E "All files|Statements" coverage-output.txt | head -1 | grep -oE "[0-9]+\.[0-9]+|[0-9]+" | head -1 || echo "0")
|
||||||
|
|
||||||
|
echo "Coverage: ${COVERAGE}%"
|
||||||
|
echo "Threshold: {{coverage_threshold}}%"
|
||||||
|
|
||||||
|
# Compare coverage
|
||||||
|
if (( $(echo "$COVERAGE < {{coverage_threshold}}" | bc -l) )); then
|
||||||
|
echo "❌ Coverage ${COVERAGE}% below threshold {{coverage_threshold}}%"
|
||||||
|
echo "Builder must add more tests before proceeding"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "✅ Coverage gate passed: ${COVERAGE}%"
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
If coverage fails: add to issues list for Builder to fix.
|
||||||
</step>
|
</step>
|
||||||
|
|
||||||
<step name="resume_builder_with_findings">
|
<step name="resume_builder_with_findings">
|
||||||
|
|
@ -156,68 +542,274 @@ If PASS: Proceed to reconciliation.
|
||||||
|
|
||||||
\`\`\`
|
\`\`\`
|
||||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
🔧 PHASE 5: RECONCILIATION (Orchestrator)
|
📊 PHASE 5: RECONCILIATION (Orchestrator)
|
||||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
\`\`\`
|
\`\`\`
|
||||||
|
|
||||||
**YOU (orchestrator) do this directly. No agent spawn.**
|
**YOU (orchestrator) do this directly. No agent spawn.**
|
||||||
|
|
||||||
1. Get what was built (git log, git diff)
|
**5.1: Load completion artifacts**
|
||||||
2. Read story file
|
\`\`\`bash
|
||||||
3. Check off completed tasks (Edit tool)
|
BUILDER_FIXES="docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json"
|
||||||
4. Fill Dev Agent Record with pipeline details
|
INSPECTOR="docs/sprint-artifacts/completions/{{story_key}}-inspector.json"
|
||||||
5. Verify updates (grep task checkboxes)
|
\`\`\`
|
||||||
6. Update sprint-status.yaml to "done"
|
|
||||||
|
Use Read tool on all artifacts.
|
||||||
|
|
||||||
|
**5.2: Read story file**
|
||||||
|
Use Read tool: \`docs/sprint-artifacts/{{story_key}}.md\`
|
||||||
|
|
||||||
|
**5.3: Check off completed tasks using Inspector evidence**
|
||||||
|
|
||||||
|
For each task in \`inspector.task_verification\`:
|
||||||
|
- If \`implemented: true\` and has evidence:
|
||||||
|
- Use Edit tool: \`"- [ ] {{task}}"\` → \`"- [x] {{task}}"\`
|
||||||
|
|
||||||
|
**5.4: Fill Dev Agent Record with evidence**
|
||||||
|
|
||||||
|
Use Edit tool:
|
||||||
|
\`\`\`markdown
|
||||||
|
### Dev Agent Record
|
||||||
|
**Implementation Date:** {{timestamp}}
|
||||||
|
**Agent Model:** Claude Sonnet 4.5 (multi-agent pipeline v4.0)
|
||||||
|
**Git Commit:** {{git_commit}}
|
||||||
|
|
||||||
|
**Pipeline Phases:**
|
||||||
|
- Phase 0: Playbook Query ({{playbooks_loaded}} loaded)
|
||||||
|
- Phase 1: Builder (initial implementation)
|
||||||
|
- Phase 2: Parallel Verification
|
||||||
|
- Inspector: {{verdict}} with code citations
|
||||||
|
- Test Quality: {{verdict}}
|
||||||
|
- {{REVIEWER_COUNT}} Reviewers: {{issues_found}}
|
||||||
|
- Phase 2.5: Coverage Gate ({{coverage}}%)
|
||||||
|
- Phase 3: Builder (resumed, fixed {{fixes_count}} issues)
|
||||||
|
- Phase 4: Inspector re-check ({{verdict}})
|
||||||
|
|
||||||
|
**Files Created:** {{count}}
|
||||||
|
**Files Modified:** {{count}}
|
||||||
|
**Tests:** {{tests.passing}}/{{tests.total}} passing ({{coverage}}%)
|
||||||
|
**Issues Fixed:** {{critical}} CRITICAL, {{high}} HIGH, {{medium}} MEDIUM
|
||||||
|
|
||||||
|
**Task Evidence:** (Inspector code citations)
|
||||||
|
{{for each task with evidence}}
|
||||||
|
- [x] {{task}}
|
||||||
|
- {{evidence[0].file}}:{{evidence[0].lines}}
|
||||||
|
{{endfor}}
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
**5.5: Verify updates**
|
||||||
|
\`\`\`bash
|
||||||
|
CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
|
||||||
|
[ "$CHECKED" -gt 0 ] || { echo "❌ Zero tasks checked"; exit 1; }
|
||||||
|
echo "✅ Reconciled: $CHECKED tasks with evidence"
|
||||||
|
\`\`\`
|
||||||
</step>
|
</step>
|
||||||
|
|
||||||
<step name="final_verification">
|
<step name="final_verification">
|
||||||
**Final Quality Gate**
|
**Final Quality Gate**
|
||||||
|
|
||||||
Verify:
|
\`\`\`bash
|
||||||
1. Git commit exists
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
2. Story tasks checked (count > 0)
|
echo "🔍 FINAL VERIFICATION"
|
||||||
3. Dev Agent Record filled
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
4. Sprint status updated
|
|
||||||
|
|
||||||
If verification fails: fix using Edit, then re-verify.
|
# 1. Git commit exists
|
||||||
|
git log --oneline -3 | grep "{{story_key}}" || { echo "❌ No commit"; exit 1; }
|
||||||
|
echo "✅ Git commit found"
|
||||||
|
|
||||||
|
# 2. Story tasks checked with evidence
|
||||||
|
CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
|
||||||
|
[ "$CHECKED" -gt 0 ] || { echo "❌ No tasks checked"; exit 1; }
|
||||||
|
echo "✅ $CHECKED tasks checked with code citations"
|
||||||
|
|
||||||
|
# 3. Dev Agent Record filled
|
||||||
|
grep -A 5 "### Dev Agent Record" docs/sprint-artifacts/{{story_key}}.md | grep -q "202" || { echo "❌ Record not filled"; exit 1; }
|
||||||
|
echo "✅ Dev Agent Record filled"
|
||||||
|
|
||||||
|
# 4. Coverage met threshold
|
||||||
|
FINAL_COVERAGE=$(jq -r '.tests.coverage' docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json)
|
||||||
|
if (( $(echo "$FINAL_COVERAGE < {{coverage_threshold}}" | bc -l) )); then
|
||||||
|
echo "❌ Coverage ${FINAL_COVERAGE}% still below threshold"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "✅ Coverage: ${FINAL_COVERAGE}%"
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "✅ STORY COMPLETE"
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
**Update sprint-status.yaml:**
|
||||||
|
Use Edit tool: \`"{{story_key}}: ready-for-dev"\` → \`"{{story_key}}: done"\`
|
||||||
|
</step>
|
||||||
|
|
||||||
|
<step name="playbook_reflection">
|
||||||
|
**Phase 6: Playbook Reflection**
|
||||||
|
|
||||||
|
\`\`\`
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
💡 PHASE 6: PLAYBOOK REFLECTION
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
Spawn Reflection Agent:
|
||||||
|
|
||||||
|
\`\`\`
|
||||||
|
Task({
|
||||||
|
subagent_type: "general-purpose",
|
||||||
|
description: "Extract learnings from {{story_key}}",
|
||||||
|
prompt: \`
|
||||||
|
You are the REFLECTION agent for story {{story_key}}.
|
||||||
|
|
||||||
|
<context>
|
||||||
|
Story: [inline story file]
|
||||||
|
Builder initial: [inline builder.json]
|
||||||
|
All review findings: [inline all reviewer artifacts]
|
||||||
|
Builder fixes: [inline builder-fixes.json]
|
||||||
|
Test quality issues: [inline test-quality.json]
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Identify what future agents should know:
|
||||||
|
|
||||||
|
1. **What issues were found?** (from reviewers)
|
||||||
|
2. **What did Builder miss initially?** (gaps, edge cases, security)
|
||||||
|
3. **What playbook knowledge would have prevented these?**
|
||||||
|
4. **Which module/feature area does this apply to?**
|
||||||
|
5. **Should we update existing playbook or create new?**
|
||||||
|
|
||||||
|
Questions:
|
||||||
|
- What gotchas should future builders know?
|
||||||
|
- What code patterns should be standard?
|
||||||
|
- What test requirements are essential?
|
||||||
|
- What similar stories exist?
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- [ ] Analyzed review findings
|
||||||
|
- [ ] Identified preventable issues
|
||||||
|
- [ ] Determined which playbook(s) to update
|
||||||
|
- [ ] Return structured proposal
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<completion_format>
|
||||||
|
{
|
||||||
|
"agent": "reflection",
|
||||||
|
"story_key": "{{story_key}}",
|
||||||
|
"learnings": [
|
||||||
|
{
|
||||||
|
"issue": "SQL injection in query builder",
|
||||||
|
"root_cause": "Builder used string concatenation (didn't know pattern)",
|
||||||
|
"prevention": "Playbook should document: always use parameterized queries",
|
||||||
|
"applies_to": "database queries, API endpoints with user input"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"issue": "Missing edge case tests for empty arrays",
|
||||||
|
"root_cause": "Test Quality Agent found gap",
|
||||||
|
"prevention": "Playbook should require: test null/empty/invalid for all inputs",
|
||||||
|
"applies_to": "all data processing functions"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"playbook_proposal": {
|
||||||
|
"action": "update_existing" | "create_new",
|
||||||
|
"playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
|
||||||
|
"module": "api/database",
|
||||||
|
"updates": {
|
||||||
|
"common_gotchas": [
|
||||||
|
"Never concatenate user input into SQL - use parameterized queries",
|
||||||
|
"Test edge cases: null, undefined, [], '', invalid input"
|
||||||
|
],
|
||||||
|
"code_patterns": [
|
||||||
|
"db.query(sql, [param1, param2]) ✓",
|
||||||
|
"sql + userInput ✗"
|
||||||
|
],
|
||||||
|
"test_requirements": [
|
||||||
|
"Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
|
||||||
|
"Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
|
||||||
|
],
|
||||||
|
"related_stories": ["{{story_key}}"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Save to: docs/sprint-artifacts/completions/{{story_key}}-reflection.json
|
||||||
|
</completion_format>
|
||||||
|
\`
|
||||||
|
})
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
**Wait for completion.**
|
||||||
|
|
||||||
|
**Review playbook proposal:**
|
||||||
|
\`\`\`bash
|
||||||
|
REFLECTION="docs/sprint-artifacts/completions/{{story_key}}-reflection.json"
|
||||||
|
ACTION=$(jq -r '.playbook_proposal.action' "$REFLECTION")
|
||||||
|
PLAYBOOK=$(jq -r '.playbook_proposal.playbook' "$REFLECTION")
|
||||||
|
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
echo "📝 Playbook Update Proposal"
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
echo "Action: $ACTION"
|
||||||
|
echo "Playbook: $PLAYBOOK"
|
||||||
|
echo ""
|
||||||
|
jq -r '.learnings[] | "- \(.issue)\n Prevention: \(.prevention)"' "$REFLECTION"
|
||||||
|
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
\`\`\`
|
||||||
|
|
||||||
|
If \`auto_apply_updates: true\` in config:
|
||||||
|
- Read playbook (or create from template if new)
|
||||||
|
- Use Edit tool to add learnings to sections
|
||||||
|
- Commit playbook update
|
||||||
|
|
||||||
|
If \`auto_apply_updates: false\` (default):
|
||||||
|
- Display proposal for manual review
|
||||||
|
- User can apply later with \`/update-playbooks {{story_key}}\`
|
||||||
</step>
|
</step>
|
||||||
|
|
||||||
</process>
|
</process>
|
||||||
|
|
||||||
<failure_handling>
|
<failure_handling>
|
||||||
**Builder fails:** Don't spawn verification. Report failure and halt.
|
**Builder fails (Phase 1):** Don't spawn verification. Report failure and halt.
|
||||||
**Inspector fails (Phase 2):** Still run Reviewers in parallel, collect all findings together.
|
**Inspector fails (Phase 2):** Still collect other reviewer findings.
|
||||||
**Inspector fails (Phase 4):** Resume Builder again with new issues (iterative fix loop).
|
**Test Quality fails:** Add issues to Builder fix list.
|
||||||
**Builder resume fails:** Report unfixed issues. Manual intervention needed.
|
**Coverage below threshold:** Add to Builder fix list.
|
||||||
**Reconciliation fails:** Fix using Edit tool. Re-verify checkboxes.
|
**Reviewers find CRITICAL:** Builder MUST fix when resumed.
|
||||||
|
**Inspector fails (Phase 4):** Resume Builder again (iterative loop, max 3 iterations).
|
||||||
|
**Builder resume fails:** Report unfixed issues. Manual intervention.
|
||||||
|
**Reconciliation fails:** Fix with Edit tool, re-verify.
|
||||||
</failure_handling>
|
</failure_handling>
|
||||||
|
|
||||||
<complexity_routing>
|
<complexity_routing>
|
||||||
| Complexity | Pipeline | Reviewers | Total Phase 2 Agents |
|
| Complexity | Phase 2 Agents | Total | Security |
|
||||||
|------------|----------|-----------|---------------------|
|
|------------|----------------|-------|----------|
|
||||||
| micro | Builder → [Inspector + 2 Reviewers] → Resume Builder → Inspector recheck | 2 (security, architect) | 3 agents |
|
| micro | Inspector + Test Quality + 2 Reviewers | 4 agents | Security Reviewer + Architect |
|
||||||
| standard | Builder → [Inspector + 3 Reviewers] → Resume Builder → Inspector recheck | 3 (security, logic, architect) | 4 agents |
|
| standard | Inspector + Test Quality + 3 Reviewers | 5 agents | Security + Logic + Architect |
|
||||||
| complex | Builder → [Inspector + 4 Reviewers] → Resume Builder → Inspector recheck | 4 (security, logic, architect, quality) | 5 agents |
|
| complex | Inspector + Test Quality + 4 Reviewers | 6 agents | Security + Logic + Architect + Quality |
|
||||||
|
|
||||||
**Key Improvements (v3.2.0):**
|
**All verification agents spawn in parallel (single message)**
|
||||||
- All verification agents spawn in parallel (single message, faster execution)
|
|
||||||
- Builder resume in Phase 3 saves 50-70% tokens vs spawning fresh Fixer
|
|
||||||
- **NEW:** Architect/Integration Reviewer catches runtime issues (404s, pattern violations, missing migrations)
|
|
||||||
|
|
||||||
**Reviewer Specializations:**
|
|
||||||
- **Security:** Auth, injection, secrets, cross-tenant access
|
|
||||||
- **Logic/Performance:** Bugs, edge cases, N+1 queries, race conditions
|
|
||||||
- **Architect/Integration:** Routes work, patterns match, migrations applied, dependencies installed (v3.2.0+)
|
|
||||||
- **Code Quality:** Maintainability, naming, duplication (complex only)
|
|
||||||
</complexity_routing>
|
</complexity_routing>
|
||||||
|
|
||||||
<success_criteria>
|
<success_criteria>
|
||||||
- [ ] Builder spawned and agent_id saved
|
- [ ] Phase 0: Playbooks loaded (if available)
|
||||||
- [ ] All verification agents completed in parallel
|
- [ ] Phase 1: Builder spawned, agent_id saved
|
||||||
- [ ] Builder resumed with consolidated findings
|
- [ ] Phase 2: All verification agents completed in parallel
|
||||||
- [ ] Inspector recheck passed
|
- [ ] Phase 2.5: Coverage gate passed
|
||||||
- [ ] Git commit exists for story
|
- [ ] Phase 3: Builder resumed with consolidated findings
|
||||||
- [ ] Story file has checked tasks (count > 0)
|
- [ ] Phase 4: Inspector recheck passed
|
||||||
- [ ] Dev Agent Record filled with all phases
|
- [ ] Phase 5: Orchestrator reconciled with Inspector evidence
|
||||||
- [ ] Sprint status updated to "done"
|
- [ ] Phase 6: Playbook reflection completed
|
||||||
|
- [ ] Git commit exists
|
||||||
|
- [ ] Story tasks checked with code citations
|
||||||
|
- [ ] Dev Agent Record filled
|
||||||
|
- [ ] Coverage ≥ {{coverage_threshold}}%
|
||||||
|
- [ ] Sprint status: done
|
||||||
</success_criteria>
|
</success_criteria>
|
||||||
|
|
||||||
|
<improvements_v4>
|
||||||
|
1. ✅ Resume Builder for fixes (v3.2+) - 50-70% token savings
|
||||||
|
2. ✅ Inspector provides code citations (v4.0) - file:line evidence for every task
|
||||||
|
3. ✅ Removed "hospital-grade" framing (v4.0) - kept disciplined gates
|
||||||
|
4. ✅ Micro stories get 2 reviewers + security scan (v3.2+) - not zero
|
||||||
|
5. ✅ Test Quality Agent (v4.0) + Coverage Gate (v4.0) - validates test quality and enforces threshold
|
||||||
|
6. ✅ Playbook query (v4.0) before Builder + reflection (v4.0) after - continuous learning
|
||||||
|
</improvements_v4>
|
||||||
|
|
|
||||||
|
|
@ -1,7 +1,7 @@
|
||||||
name: story-full-pipeline
|
name: story-full-pipeline
|
||||||
description: "Multi-agent pipeline with wave-based execution, independent validation, and adversarial code review (GSDMAD)"
|
description: "Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes"
|
||||||
author: "BMAD Method + GSD"
|
author: "BMAD Method"
|
||||||
version: "3.2.0" # Added architect-integration-reviewer for runtime verification
|
version: "4.0.0" # Added playbook learning, test quality, coverage gates, Inspector code citations
|
||||||
|
|
||||||
# Execution mode
|
# Execution mode
|
||||||
execution_mode: "multi_agent" # multi_agent | single_agent (fallback)
|
execution_mode: "multi_agent" # multi_agent | single_agent (fallback)
|
||||||
|
|
@ -37,13 +37,23 @@ agents:
|
||||||
timeout: 3600 # 1 hour
|
timeout: 3600 # 1 hour
|
||||||
|
|
||||||
inspector:
|
inspector:
|
||||||
description: "Validation agent - independent verification"
|
description: "Validation agent - independent verification with code citations"
|
||||||
steps: [5, 6]
|
steps: [5, 6]
|
||||||
subagent_type: "general-purpose"
|
subagent_type: "general-purpose"
|
||||||
prompt_file: "{agents_path}/inspector.md"
|
prompt_file: "{agents_path}/inspector.md"
|
||||||
fresh_context: true # No knowledge of builder agent
|
fresh_context: true # No knowledge of builder agent
|
||||||
trust_level: "medium" # No conflict of interest
|
trust_level: "medium" # No conflict of interest
|
||||||
timeout: 1800 # 30 minutes
|
timeout: 1800 # 30 minutes
|
||||||
|
require_code_citations: true # v4.0: Must provide file:line evidence for all tasks
|
||||||
|
|
||||||
|
test_quality:
|
||||||
|
description: "Test quality validation - verifies test coverage and quality"
|
||||||
|
steps: [5.5]
|
||||||
|
subagent_type: "general-purpose"
|
||||||
|
prompt_file: "{agents_path}/test-quality.md"
|
||||||
|
fresh_context: true
|
||||||
|
trust_level: "medium"
|
||||||
|
timeout: 1200 # 20 minutes
|
||||||
|
|
||||||
reviewer:
|
reviewer:
|
||||||
description: "Adversarial code review - finds problems"
|
description: "Adversarial code review - finds problems"
|
||||||
|
|
@ -73,15 +83,40 @@ agents:
|
||||||
trust_level: "medium" # Incentive to minimize work
|
trust_level: "medium" # Incentive to minimize work
|
||||||
timeout: 2400 # 40 minutes
|
timeout: 2400 # 40 minutes
|
||||||
|
|
||||||
|
reflection:
|
||||||
|
description: "Playbook learning - extracts patterns for future agents"
|
||||||
|
steps: [10]
|
||||||
|
subagent_type: "general-purpose"
|
||||||
|
prompt_file: "{agents_path}/reflection.md"
|
||||||
|
timeout: 900 # 15 minutes
|
||||||
|
|
||||||
# Reconciliation: orchestrator does this directly (see workflow.md Phase 5)
|
# Reconciliation: orchestrator does this directly (see workflow.md Phase 5)
|
||||||
|
|
||||||
|
# Playbook configuration (v4.0)
|
||||||
|
playbooks:
|
||||||
|
enabled: true # Set to false in project config to disable
|
||||||
|
directory: "docs/playbooks/implementation-playbooks"
|
||||||
|
bootstrap_mode: true # Auto-initialize if missing
|
||||||
|
max_load: 3
|
||||||
|
auto_apply_updates: false # Require manual review of playbook updates
|
||||||
|
discovery:
|
||||||
|
enabled: true # Scan git/docs to populate initial playbooks
|
||||||
|
sources: ["git_history", "docs", "existing_code"]
|
||||||
|
|
||||||
|
# Quality gates (v4.0)
|
||||||
|
quality_gates:
|
||||||
|
coverage_threshold: 80 # % line coverage required
|
||||||
|
task_verification: "all_with_evidence" # Inspector must provide file:line citations
|
||||||
|
critical_issues: "must_fix"
|
||||||
|
high_issues: "must_fix"
|
||||||
|
|
||||||
# Complexity level (determines which steps to execute)
|
# Complexity level (determines which steps to execute)
|
||||||
complexity_level: "standard" # micro | standard | complex
|
complexity_level: "standard" # micro | standard | complex
|
||||||
|
|
||||||
# Complexity routing
|
# Complexity routing
|
||||||
complexity_routing:
|
complexity_routing:
|
||||||
micro:
|
micro:
|
||||||
skip_agents: ["reviewer"] # Skip code review for micro stories
|
skip_agents: [] # Full pipeline (v4.0: micro gets security scan)
|
||||||
description: "Lightweight path for low-risk stories"
|
description: "Lightweight path for low-risk stories"
|
||||||
examples: ["UI tweaks", "text changes", "simple CRUD"]
|
examples: ["UI tweaks", "text changes", "simple CRUD"]
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,85 @@
|
||||||
|
# {{Module/Feature Area}} - Implementation Playbook
|
||||||
|
|
||||||
|
> **Purpose:** Guide future agents implementing features in {{module_name}}
|
||||||
|
> **Created:** {{date}}
|
||||||
|
> **Last Updated:** {{date}}
|
||||||
|
|
||||||
|
## Common Gotchas
|
||||||
|
|
||||||
|
**What mistakes to avoid:**
|
||||||
|
|
||||||
|
- Add specific gotchas here as they're discovered
|
||||||
|
- Example: "Never concatenate user input into SQL queries"
|
||||||
|
- Example: "Always validate file paths before operations"
|
||||||
|
|
||||||
|
## Code Patterns
|
||||||
|
|
||||||
|
**Standard approaches that work:**
|
||||||
|
|
||||||
|
### Pattern: {{Pattern Name}}
|
||||||
|
|
||||||
|
✓ **Good:**
|
||||||
|
```
|
||||||
|
// Example of correct pattern
|
||||||
|
db.query(sql, [param1, param2])
|
||||||
|
```
|
||||||
|
|
||||||
|
✗ **Bad:**
|
||||||
|
```
|
||||||
|
// Example of incorrect pattern
|
||||||
|
sql + userInput
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern: {{Another Pattern}}
|
||||||
|
|
||||||
|
✓ **Good:**
|
||||||
|
```
|
||||||
|
// Another example
|
||||||
|
if (!data) return null;
|
||||||
|
```
|
||||||
|
|
||||||
|
✗ **Bad:**
|
||||||
|
```
|
||||||
|
// Don't do this
|
||||||
|
data.map(...) // crashes if data is null
|
||||||
|
```
|
||||||
|
|
||||||
|
## Test Requirements
|
||||||
|
|
||||||
|
**Essential tests for this module:**
|
||||||
|
|
||||||
|
- **Happy path:** Verify primary functionality
|
||||||
|
- **Edge cases:** Test null, undefined, empty arrays, invalid inputs
|
||||||
|
- **Error conditions:** Verify errors are handled properly
|
||||||
|
- **Security:** Test for injection attacks, auth bypasses, etc.
|
||||||
|
|
||||||
|
### Example Test Pattern
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
describe('FeatureName', () => {
|
||||||
|
it('handles happy path', () => {
|
||||||
|
expect(fn(validInput)).toEqual(expected)
|
||||||
|
})
|
||||||
|
|
||||||
|
it('handles edge cases', () => {
|
||||||
|
expect(fn(null)).toThrow()
|
||||||
|
expect(fn([])).toEqual([])
|
||||||
|
})
|
||||||
|
|
||||||
|
it('validates security', () => {
|
||||||
|
expect(fn("' OR 1=1--")).toThrow()
|
||||||
|
})
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
## Related Stories
|
||||||
|
|
||||||
|
Stories that used these patterns:
|
||||||
|
|
||||||
|
- {{story_key}} - {{brief description}}
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Keep this simple and actionable
|
||||||
|
- Add new learnings as they emerge
|
||||||
|
- Focus on preventable mistakes
|
||||||
Loading…
Reference in New Issue