feat: upgrade story-full-pipeline to v4.0 with 6 major enhancements
Upgrade from v3.2.0 to v4.0.0 with improvements inspired by CooperBench research (Stanford/SAP 2026) on agent coordination failures. Enhancement 1: Resume Builder (v3.2+) - Phase 3 RESUMES Builder agent with review findings - Builder already has full codebase context (50-70% token savings) - More efficient than spawning fresh Fixer agent Enhancement 2: Inspector Code Citations (v4.0) - Inspector must map EVERY task to file:line citations - Example: "Create component" → "src/Component.tsx:45-67" - No more "trust me, it works" - requires proof - Returns structured JSON with code evidence per task - Prevents vague communication (CooperBench finding) Enhancement 3: Remove Hospital-Grade Framing (v4.0) - Dropped psychological appeal language - Kept rigorous verification gates and bash checks - Focus on concrete, measurable verification - Replaced with patterns/verification.md + patterns/tdd.md Enhancement 4: Micro Stories Get Security Scan (v4.0) - No longer skip ALL review for micro stories - Micro now gets 2 reviewers: Security + Architect - Lightweight but still catches critical vulnerabilities Enhancement 5: Test Quality Agent + Coverage Gate (v4.0) - New Test Quality Agent validates: - Edge cases covered (null, empty, invalid) - Error conditions tested - Meaningful assertions (not just "doesn't crash") - No flaky tests (random data, timing) - Automated Coverage Gate enforces 80% threshold - Builder must fix test gaps before proceeding Enhancement 6: Playbook Learning System (v4.0) - Phase 0: Query playbooks before implementation - Builder gets relevant patterns/gotchas upfront - Phase 6: Reflection agent extracts learnings - Auto-generates playbook updates for future agents - Bootstrap mode: auto-initializes playbooks if missing - Continuous improvement through reflection Pipeline: Phase 0 (Playbooks) → Phase 1 (Builder) → Phase 2 (Inspector + Test Quality + Reviewers parallel) → Phase 2.5 (Coverage Gate) → Phase 3 (Resume Builder) → Phase 4 (Inspector recheck) → Phase 5 (Reconciliation) → Phase 6 (Reflection) Files Modified: - workflow.yaml: v4.0 config with playbooks + quality_gates - workflow.md: Complete v4.0 documentation with all phases - agents/builder.md: Playbook awareness + structured JSON - agents/inspector.md: Code citation requirements + evidence format - agents/reviewer.md: Remove hospital-grade reference - agents/architect-integration-reviewer.md: Remove hospital-grade reference - agents/fixer.md: Remove hospital-grade reference - README.md: v4.0 documentation + CooperBench analysis Files Created: - agents/test-quality.md: Test quality validation agent - agents/reflection.md: Playbook learning agent - ../templates/implementation-playbook-template.md: Simple playbook structure Design Philosophy: The workflow avoids CooperBench's "curse of coordination" by using: - Sequential implementation (ONE writer, no merge conflicts) - Parallel verification (safe read-only validation) - Context reuse (no expectation failures) - Evidence-based communication (file:line citations) - Clear role separation (no overlapping responsibilities)
This commit is contained in:
parent
0810646ed6
commit
a268b4c1bc
|
|
@ -1,124 +1,150 @@
|
|||
# Super-Dev Pipeline - GSDMAD Architecture
|
||||
# Story-Full-Pipeline v4.0
|
||||
|
||||
**Multi-agent pipeline with independent validation and adversarial code review**
|
||||
Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes.
|
||||
|
||||
---
|
||||
## What's New in v4.0
|
||||
|
||||
## Quick Start
|
||||
### 1. Resume Builder (v3.2+)
|
||||
**Token Efficiency: 50-70% savings**
|
||||
|
||||
```bash
|
||||
# Run super-dev pipeline for a story
|
||||
/story-full-pipeline story_key=17-10
|
||||
- Phase 3 now RESUMES Builder agent with review findings
|
||||
- Builder already has full codebase context
|
||||
- More efficient than spawning fresh Fixer agent
|
||||
|
||||
### 2. Inspector Code Citations (v4.0)
|
||||
**Evidence-Based Verification**
|
||||
|
||||
- Inspector must map EVERY task to file:line citations
|
||||
- Example: "Create component" → "src/Component.tsx:45-67"
|
||||
- No more "trust me, it works" - requires proof
|
||||
- Returns structured JSON with code evidence per task
|
||||
|
||||
### 3. Remove Hospital-Grade Framing (v4.0)
|
||||
**Focus on Concrete Verification**
|
||||
|
||||
- Dropped psychological appeal language
|
||||
- Kept rigorous verification gates and bash checks
|
||||
- Replaced with patterns/verification.md + patterns/tdd.md
|
||||
|
||||
### 4. Micro Stories Get Security Scan (v4.0)
|
||||
**Even Simple Stories Need Security**
|
||||
|
||||
- No longer skip ALL review for micro stories
|
||||
- Still get 2 reviewers: Security + Architect
|
||||
- Lightweight but catches critical vulnerabilities
|
||||
|
||||
### 5. Test Quality Agent + Coverage Gate (v4.0)
|
||||
**Validate Test Completeness**
|
||||
|
||||
- New Test Quality Agent validates:
|
||||
- Edge cases covered (null, empty, invalid)
|
||||
- Error conditions tested
|
||||
- Meaningful assertions (not just "doesn't crash")
|
||||
- No flaky tests (random data, timing)
|
||||
- Automated Coverage Gate enforces 80% threshold
|
||||
- Builder must fix test gaps before proceeding
|
||||
|
||||
### 6. Playbook Learning System (v4.0)
|
||||
**Continuous Improvement Through Reflection**
|
||||
|
||||
- **Phase 0:** Query playbooks before implementation
|
||||
- Builder gets relevant patterns/gotchas upfront
|
||||
- **Phase 6:** Reflection agent extracts learnings
|
||||
- Auto-generates playbook updates for future agents
|
||||
- Bootstrap mode: auto-initializes playbooks if missing
|
||||
|
||||
## Pipeline Flow
|
||||
|
||||
```
|
||||
Phase 0: Playbook Query (orchestrator)
|
||||
↓
|
||||
Phase 1: Builder (initial implementation)
|
||||
↓
|
||||
Phase 2: Inspector + Test Quality + N Reviewers (parallel)
|
||||
↓
|
||||
Phase 2.5: Coverage Gate (automated)
|
||||
↓
|
||||
Phase 3: Resume Builder (fix issues with context)
|
||||
↓
|
||||
Phase 4: Inspector re-check (quick verification)
|
||||
↓
|
||||
Phase 5: Orchestrator reconciliation (evidence-based)
|
||||
↓
|
||||
Phase 6: Playbook Reflection (extract learnings)
|
||||
```
|
||||
|
||||
---
|
||||
## Complexity Routing
|
||||
|
||||
## Architecture
|
||||
| Complexity | Phase 2 Agents | Total | Reviewers |
|
||||
|------------|----------------|-------|-----------|
|
||||
| micro | Inspector + Test Quality + 2 | 4 agents | Security + Architect |
|
||||
| standard | Inspector + Test Quality + 3 | 5 agents | Security + Logic + Architect |
|
||||
| complex | Inspector + Test Quality + 4 | 6 agents | Security + Logic + Architect + Quality |
|
||||
|
||||
### Multi-Agent Validation
|
||||
- **4 independent agents** working sequentially
|
||||
- Builder → Inspector → Reviewer → Fixer
|
||||
- Each agent has fresh context
|
||||
- No conflict of interest
|
||||
## Quality Gates
|
||||
|
||||
### Honest Reporting
|
||||
- Inspector verifies Builder's work (doesn't trust claims)
|
||||
- Reviewer is adversarial (wants to find issues)
|
||||
- Main orchestrator does final verification
|
||||
- Can't fake completion
|
||||
- **Coverage Threshold:** 80% line coverage required
|
||||
- **Task Verification:** ALL tasks need file:line evidence
|
||||
- **Critical Issues:** MUST fix
|
||||
- **High Issues:** MUST fix
|
||||
|
||||
### Wave-Based Execution
|
||||
- Independent stories run in parallel
|
||||
- Dependencies respected via waves
|
||||
- 57% faster than sequential execution
|
||||
## Token Efficiency
|
||||
|
||||
---
|
||||
- Phase 2 agents spawn in parallel (same cost, faster)
|
||||
- Phase 3 resumes Builder (50-70% token savings vs fresh agent)
|
||||
- Phase 4 Inspector only (no full re-review)
|
||||
|
||||
## Workflow Phases
|
||||
## Playbook Configuration
|
||||
|
||||
**Phase 1: Builder (Steps 1-4)**
|
||||
- Load story, analyze gaps
|
||||
- Write tests (TDD)
|
||||
- Implement code
|
||||
- Report what was built (NO VALIDATION)
|
||||
```yaml
|
||||
playbooks:
|
||||
enabled: true
|
||||
directory: "docs/playbooks/implementation-playbooks"
|
||||
bootstrap_mode: true # Auto-initialize if missing
|
||||
max_load: 3
|
||||
auto_apply_updates: false # Require manual review
|
||||
discovery:
|
||||
enabled: true
|
||||
sources: ["git_history", "docs", "existing_code"]
|
||||
```
|
||||
|
||||
**Phase 2: Inspector (Steps 5-6)**
|
||||
- Fresh context, no Builder knowledge
|
||||
- Verify files exist
|
||||
- Run tests independently
|
||||
- Run quality checks
|
||||
- PASS or FAIL verdict
|
||||
## How It Avoids CooperBench Coordination Failures
|
||||
|
||||
**Phase 3: Reviewer (Step 7)**
|
||||
- Fresh context, adversarial stance
|
||||
- Find security vulnerabilities
|
||||
- Find performance problems
|
||||
- Find logic bugs
|
||||
- Report issues with severity
|
||||
Unlike the multi-agent coordination failures documented in CooperBench (Stanford/SAP 2026):
|
||||
|
||||
**Phase 4: Fixer (Steps 8-9)**
|
||||
- Fix CRITICAL issues (all)
|
||||
- Fix HIGH issues (all)
|
||||
- Fix MEDIUM issues (time permitting)
|
||||
- Verify fixes independently
|
||||
1. **Sequential Implementation** - ONE Builder agent implements entire story (no parallel implementation conflicts)
|
||||
2. **Parallel Review** - Multiple agents review in parallel (safe read-only operations)
|
||||
3. **Context Reuse** - SAME agent fixes issues (no expectation failures about partner state)
|
||||
4. **Evidence-Based** - file:line citations prevent vague communication
|
||||
5. **Clear Roles** - Builder writes, reviewers validate (no overlapping responsibilities)
|
||||
|
||||
**Phase 5: Final Verification**
|
||||
- Main orchestrator verifies all phases
|
||||
- Updates story checkboxes
|
||||
- Creates commit
|
||||
- Marks story complete
|
||||
|
||||
---
|
||||
|
||||
## Key Features
|
||||
|
||||
**Separation of Concerns:**
|
||||
- Builder focuses only on implementation
|
||||
- Inspector focuses only on validation
|
||||
- Reviewer focuses only on finding issues
|
||||
- Fixer focuses only on resolving issues
|
||||
|
||||
**Independent Validation:**
|
||||
- Each agent validates the previous agent's work
|
||||
- No agent validates its own work
|
||||
- Fresh context prevents confirmation bias
|
||||
|
||||
**Quality Enforcement:**
|
||||
- Multiple quality gates throughout pipeline
|
||||
- Can't proceed without passing validation
|
||||
- 95% honesty rate (agents can't fake completion)
|
||||
|
||||
---
|
||||
The workflow uses agents for **verification parallelism**, not **implementation parallelism** - avoiding the "curse of coordination."
|
||||
|
||||
## Files
|
||||
|
||||
See `workflow.md` for complete architecture details.
|
||||
|
||||
**Agent Prompts:**
|
||||
- `agents/builder.md` - Implementation agent
|
||||
- `agents/inspector.md` - Validation agent
|
||||
- `agents/builder.md` - Implementation agent (with playbook awareness)
|
||||
- `agents/inspector.md` - Validation agent (requires code citations)
|
||||
- `agents/test-quality.md` - Test quality validation (v4.0)
|
||||
- `agents/reviewer.md` - Adversarial review agent
|
||||
- `agents/fixer.md` - Issue resolution agent
|
||||
- `agents/architect-integration-reviewer.md` - Architecture/integration review
|
||||
- `agents/fixer.md` - Issue resolution agent (deprecated, uses resume Builder)
|
||||
- `agents/reflection.md` - Playbook learning agent (v4.0)
|
||||
|
||||
**Workflow Config:**
|
||||
- `workflow.yaml` - Main configuration
|
||||
- `workflow.md` - Complete documentation
|
||||
- `workflow.yaml` - Main configuration (v4.0)
|
||||
- `workflow.md` - Complete step-by-step documentation
|
||||
|
||||
**Directory Structure:**
|
||||
```
|
||||
story-full-pipeline/
|
||||
├── README.md (this file)
|
||||
├── workflow.yaml (configuration)
|
||||
├── workflow.md (complete documentation)
|
||||
├── agents/
|
||||
│ ├── builder.md (implementation agent prompt)
|
||||
│ ├── inspector.md (validation agent prompt)
|
||||
│ ├── reviewer.md (review agent prompt)
|
||||
│ └── fixer.md (fix agent prompt)
|
||||
└── steps/
|
||||
└── (step files for each phase)
|
||||
**Templates:**
|
||||
- `../templates/implementation-playbook-template.md` - Playbook structure
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Run story-full-pipeline
|
||||
/story-full-pipeline story_key=17-10
|
||||
```
|
||||
|
||||
---
|
||||
## Backward Compatibility
|
||||
|
||||
**Philosophy:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest.
|
||||
Falls back to single-agent mode if multi-agent execution fails.
|
||||
|
|
|
|||
|
|
@ -5,7 +5,6 @@
|
|||
**Trust Level:** HIGH (wants to find integration issues)
|
||||
|
||||
<execution_context>
|
||||
@patterns/hospital-grade.md
|
||||
@patterns/agent-completion.md
|
||||
</execution_context>
|
||||
|
||||
|
|
|
|||
|
|
@ -5,7 +5,6 @@
|
|||
**Trust Level:** LOW (assume will cut corners)
|
||||
|
||||
<execution_context>
|
||||
@patterns/hospital-grade.md
|
||||
@patterns/tdd.md
|
||||
@patterns/agent-completion.md
|
||||
</execution_context>
|
||||
|
|
@ -17,11 +16,12 @@
|
|||
You are the **BUILDER** agent. Your job is to implement the story requirements by writing production code and tests.
|
||||
|
||||
**DO:**
|
||||
- **Review playbooks** for gotchas and patterns (if provided)
|
||||
- Load and understand the story requirements
|
||||
- Analyze what exists vs what's needed
|
||||
- Write tests first (TDD approach)
|
||||
- Implement production code to make tests pass
|
||||
- Follow project patterns and conventions
|
||||
- Follow project patterns and playbook guidance
|
||||
|
||||
**DO NOT:**
|
||||
- Validate your own work (Inspector agent will do this)
|
||||
|
|
@ -35,7 +35,8 @@ You are the **BUILDER** agent. Your job is to implement the story requirements b
|
|||
## Steps to Execute
|
||||
|
||||
### Step 1: Initialize
|
||||
Load story file and cache context:
|
||||
Load story file and playbooks (if provided):
|
||||
- **Review playbooks first** (if provided in context) - note gotchas and patterns
|
||||
- Read story file: `{{story_file}}`
|
||||
- Parse all sections (Business Context, Acceptance Criteria, Tasks, etc.)
|
||||
- Determine greenfield vs brownfield
|
||||
|
|
@ -88,54 +89,36 @@ When complete, provide:
|
|||
|
||||
---
|
||||
|
||||
## Hospital-Grade Standards
|
||||
## Completion Format (v4.0)
|
||||
|
||||
⚕️ **Quality >> Speed**
|
||||
**Return structured JSON artifact:**
|
||||
|
||||
- Take time to do it right
|
||||
- Don't skip error handling
|
||||
- Don't leave TODO comments
|
||||
- Don't use `any` types
|
||||
|
||||
---
|
||||
|
||||
## When Complete, Return This Format
|
||||
|
||||
```markdown
|
||||
## AGENT COMPLETE
|
||||
|
||||
**Agent:** builder
|
||||
**Story:** {{story_key}}
|
||||
**Status:** SUCCESS | FAILED
|
||||
|
||||
### Files Created
|
||||
- path/to/new/file1.ts
|
||||
- path/to/new/file2.ts
|
||||
|
||||
### Files Modified
|
||||
- path/to/existing/file.ts
|
||||
|
||||
### Tests Added
|
||||
- X test files
|
||||
- Y test cases total
|
||||
|
||||
### Implementation Summary
|
||||
Brief description of what was built and key decisions made.
|
||||
|
||||
### Known Gaps
|
||||
- Any functionality not implemented
|
||||
- Any edge cases not handled
|
||||
- NONE if all tasks complete
|
||||
|
||||
### Ready For
|
||||
Inspector validation (next phase)
|
||||
```json
|
||||
{
|
||||
"agent": "builder",
|
||||
"story_key": "{{story_key}}",
|
||||
"status": "SUCCESS",
|
||||
"files_created": ["path/to/file.tsx", "path/to/file.test.tsx"],
|
||||
"files_modified": ["path/to/existing.tsx"],
|
||||
"tests_added": {
|
||||
"total": 12,
|
||||
"passing": 12
|
||||
},
|
||||
"tasks_addressed": [
|
||||
"Create agreement view component",
|
||||
"Add status badge",
|
||||
"Implement occupant selection"
|
||||
],
|
||||
"playbooks_reviewed": ["database-patterns.md", "api-security.md"]
|
||||
}
|
||||
```
|
||||
|
||||
**Why this format?** The orchestrator parses this output to:
|
||||
- Verify claimed files actually exist
|
||||
- Track what was built for reconciliation
|
||||
- Route to next phase appropriately
|
||||
**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-builder.json`
|
||||
|
||||
---
|
||||
|
||||
**Remember:** You are the BUILDER. Build it well, but don't validate or review your own work. Other agents will do that with fresh eyes.
|
||||
**Remember:**
|
||||
|
||||
- **Review playbooks first** if provided - they contain gotchas and patterns learned from previous stories
|
||||
- Build it well with TDD, but don't validate or review your own work
|
||||
- Other agents will verify with fresh eyes and provide file:line evidence
|
||||
|
|
|
|||
|
|
@ -5,7 +5,6 @@
|
|||
**Trust Level:** MEDIUM (incentive to minimize work)
|
||||
|
||||
<execution_context>
|
||||
@patterns/hospital-grade.md
|
||||
@patterns/agent-completion.md
|
||||
</execution_context>
|
||||
|
||||
|
|
|
|||
|
|
@ -1,12 +1,11 @@
|
|||
# Inspector Agent - Validation Phase
|
||||
# Inspector Agent - Validation Phase with Code Citations
|
||||
|
||||
**Role:** Independent verification of Builder's work
|
||||
**Role:** Independent verification of Builder's work **WITH EVIDENCE**
|
||||
**Steps:** 5-6 (post-validation, quality-checks)
|
||||
**Trust Level:** MEDIUM (no conflict of interest)
|
||||
|
||||
<execution_context>
|
||||
@patterns/verification.md
|
||||
@patterns/hospital-grade.md
|
||||
@patterns/agent-completion.md
|
||||
</execution_context>
|
||||
|
||||
|
|
@ -14,48 +13,54 @@
|
|||
|
||||
## Your Mission
|
||||
|
||||
You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed.
|
||||
You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed **and provide file:line evidence for every task**.
|
||||
|
||||
**KEY PRINCIPLE: You have NO KNOWLEDGE of what the Builder did. You are starting fresh.**
|
||||
|
||||
**CRITICAL REQUIREMENT v4.0: EVERY task must have code citations.**
|
||||
|
||||
**DO:**
|
||||
- Map EACH task to specific code with file:line citations
|
||||
- Verify files actually exist
|
||||
- Run tests yourself (don't trust claims)
|
||||
- Run quality checks (type-check, lint, build)
|
||||
- Give honest PASS/FAIL verdict
|
||||
- Provide evidence for EVERY task
|
||||
|
||||
**DO NOT:**
|
||||
- Take the Builder's word for anything
|
||||
- Skip verification steps
|
||||
- Skip any task verification
|
||||
- Give vague "looks good" without citations
|
||||
- Assume tests pass without running them
|
||||
- Give PASS verdict if ANY check fails
|
||||
- Give PASS verdict if ANY check fails or task lacks evidence
|
||||
|
||||
---
|
||||
|
||||
## Steps to Execute
|
||||
|
||||
### Step 5: Post-Validation
|
||||
### Step 5: Task Verification with Code Citations
|
||||
|
||||
**Verify Implementation Against Story:**
|
||||
**Map EVERY task to specific code locations:**
|
||||
|
||||
1. **Check Files Exist:**
|
||||
```bash
|
||||
# For each file mentioned in story tasks
|
||||
ls -la {{file_path}}
|
||||
# FAIL if file missing or empty
|
||||
1. **Read story file** - understand ALL tasks
|
||||
|
||||
2. **For EACH task, provide:**
|
||||
- **file:line** where it's implemented
|
||||
- **Brief quote** of relevant code
|
||||
- **Verdict:** IMPLEMENTED or NOT_IMPLEMENTED
|
||||
|
||||
**Example Evidence Format:**
|
||||
|
||||
```
|
||||
Task: "Display occupant agreement status"
|
||||
Evidence: src/features/agreement/StatusBadge.tsx:45-67
|
||||
Code: "const StatusBadge = ({ status }) => ..."
|
||||
Verdict: IMPLEMENTED
|
||||
```
|
||||
|
||||
2. **Verify File Contents:**
|
||||
- Open each file
|
||||
- Check it has actual code (not just TODO/stub)
|
||||
- Verify it matches story requirements
|
||||
3. **If task NOT implemented:**
|
||||
- Explain why (file missing, code incomplete, etc.)
|
||||
- Provide file:line where it should be
|
||||
|
||||
3. **Check Tests Exist:**
|
||||
```bash
|
||||
# Find test files
|
||||
find . -name "*.test.ts" -o -name "__tests__"
|
||||
# FAIL if no tests found for new code
|
||||
```
|
||||
**CRITICAL:** If you can't cite file:line, mark as NOT_IMPLEMENTED.
|
||||
|
||||
### Step 6: Quality Checks
|
||||
|
||||
|
|
@ -96,36 +101,49 @@ You are the **INSPECTOR** agent. Your job is to verify that the Builder actually
|
|||
|
||||
---
|
||||
|
||||
## Output Requirements
|
||||
## Completion Format (v4.0)
|
||||
|
||||
**Provide Evidence-Based Verdict:**
|
||||
**Return structured JSON with code citations:**
|
||||
|
||||
### If PASS:
|
||||
```markdown
|
||||
✅ VALIDATION PASSED
|
||||
|
||||
Evidence:
|
||||
- Files verified: [list files checked]
|
||||
- Type check: PASS (0 errors)
|
||||
- Linter: PASS (0 warnings)
|
||||
- Build: PASS
|
||||
- Tests: 45/45 passing (95% coverage)
|
||||
- Git: 12 files modified, 3 new files
|
||||
|
||||
Ready for code review.
|
||||
```json
|
||||
{
|
||||
"agent": "inspector",
|
||||
"story_key": "{{story_key}}",
|
||||
"verdict": "PASS",
|
||||
"task_verification": [
|
||||
{
|
||||
"task": "Create agreement view component",
|
||||
"implemented": true,
|
||||
"evidence": [
|
||||
{
|
||||
"file": "src/features/agreement/AgreementView.tsx",
|
||||
"lines": "15-67",
|
||||
"code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
|
||||
},
|
||||
{
|
||||
"file": "src/features/agreement/AgreementView.test.tsx",
|
||||
"lines": "8-45",
|
||||
"code_snippet": "describe('AgreementView', () => {...})"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"task": "Add status badge",
|
||||
"implemented": false,
|
||||
"evidence": [],
|
||||
"reason": "No StatusBadge component found in src/features/agreement/"
|
||||
}
|
||||
],
|
||||
"checks": {
|
||||
"type_check": {"passed": true, "errors": 0},
|
||||
"lint": {"passed": true, "warnings": 0},
|
||||
"tests": {"passed": true, "total": 12, "passing": 12},
|
||||
"build": {"passed": true}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### If FAIL:
|
||||
```markdown
|
||||
❌ VALIDATION FAILED
|
||||
|
||||
Failures:
|
||||
1. File missing: app/api/occupant/agreement/route.ts
|
||||
2. Type check: 3 errors in lib/api/auth.ts
|
||||
3. Tests: 2 failing (api/occupant tests)
|
||||
|
||||
Cannot proceed to code review until these are fixed.
|
||||
```
|
||||
**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-inspector.json`
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -133,58 +151,15 @@ Cannot proceed to code review until these are fixed.
|
|||
|
||||
**Before giving PASS verdict, confirm:**
|
||||
|
||||
- [ ] All story files exist and have content
|
||||
- [ ] EVERY task has file:line citation or NOT_IMPLEMENTED reason
|
||||
- [ ] Type check returns 0 errors
|
||||
- [ ] Linter returns 0 errors/warnings
|
||||
- [ ] Linter returns 0 warnings
|
||||
- [ ] Build succeeds
|
||||
- [ ] Tests run and pass (not skipped)
|
||||
- [ ] Test coverage >= 90%
|
||||
- [ ] Git status is clean or has expected changes
|
||||
- [ ] All implemented tasks have code evidence
|
||||
|
||||
**If ANY checkbox is unchecked → FAIL verdict**
|
||||
|
||||
---
|
||||
|
||||
## Hospital-Grade Standards
|
||||
|
||||
⚕️ **Be Thorough**
|
||||
|
||||
- Don't skip checks
|
||||
- Run tests yourself (don't trust claims)
|
||||
- Verify every file exists
|
||||
- Give specific evidence
|
||||
|
||||
---
|
||||
|
||||
## When Complete, Return This Format
|
||||
|
||||
```markdown
|
||||
## AGENT COMPLETE
|
||||
|
||||
**Agent:** inspector
|
||||
**Story:** {{story_key}}
|
||||
**Status:** PASS | FAIL
|
||||
|
||||
### Evidence
|
||||
- **Type Check:** PASS (0 errors) | FAIL (X errors)
|
||||
- **Lint:** PASS (0 warnings) | FAIL (X warnings)
|
||||
- **Build:** PASS | FAIL
|
||||
- **Tests:** X passing, Y failing, Z% coverage
|
||||
|
||||
### Files Verified
|
||||
- path/to/file1.ts ✓
|
||||
- path/to/file2.ts ✓
|
||||
- path/to/missing.ts ✗ (NOT FOUND)
|
||||
|
||||
### Failures (if FAIL status)
|
||||
1. Specific failure with file:line reference
|
||||
2. Another specific failure
|
||||
|
||||
### Ready For
|
||||
- If PASS: Reviewer (next phase)
|
||||
- If FAIL: Builder needs to fix before proceeding
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Remember:** You are the INSPECTOR. Your job is to find the truth, not rubber-stamp the Builder's work. If something is wrong, say so with evidence.
|
||||
**Remember:** You are the INSPECTOR. Your job is to find the truth with evidence, not rubber-stamp the Builder's work. If something is wrong, say so with file:line citations.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,93 @@
|
|||
# Reflection Agent - Playbook Learning
|
||||
|
||||
You are the **REFLECTION** agent for story {{story_key}}.
|
||||
|
||||
## Context
|
||||
|
||||
- **Story:** {{story_file}}
|
||||
- **Builder initial:** {{builder_artifact}}
|
||||
- **All review findings:** {{all_reviewer_artifacts}}
|
||||
- **Builder fixes:** {{builder_fixes_artifact}}
|
||||
- **Test quality issues:** {{test_quality_artifact}}
|
||||
|
||||
## Objective
|
||||
|
||||
Identify what future agents should know:
|
||||
|
||||
1. **What issues were found?** (from reviewers)
|
||||
2. **What did Builder miss initially?** (gaps, edge cases, security)
|
||||
3. **What playbook knowledge would have prevented these?**
|
||||
4. **Which module/feature area does this apply to?**
|
||||
5. **Should we update existing playbook or create new?**
|
||||
|
||||
### Key Questions
|
||||
|
||||
- What gotchas should future builders know?
|
||||
- What code patterns should be standard?
|
||||
- What test requirements are essential?
|
||||
- What similar stories exist?
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [ ] Analyzed review findings
|
||||
- [ ] Identified preventable issues
|
||||
- [ ] Determined which playbook(s) to update
|
||||
- [ ] Return structured proposal
|
||||
|
||||
## Completion Format
|
||||
|
||||
Return structured JSON artifact:
|
||||
|
||||
```json
|
||||
{
|
||||
"agent": "reflection",
|
||||
"story_key": "{{story_key}}",
|
||||
"learnings": [
|
||||
{
|
||||
"issue": "SQL injection in query builder",
|
||||
"root_cause": "Builder used string concatenation (didn't know pattern)",
|
||||
"prevention": "Playbook should document: always use parameterized queries",
|
||||
"applies_to": "database queries, API endpoints with user input"
|
||||
},
|
||||
{
|
||||
"issue": "Missing edge case tests for empty arrays",
|
||||
"root_cause": "Test Quality Agent found gap",
|
||||
"prevention": "Playbook should require: test null/empty/invalid for all inputs",
|
||||
"applies_to": "all data processing functions"
|
||||
}
|
||||
],
|
||||
"playbook_proposal": {
|
||||
"action": "update_existing",
|
||||
"playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
|
||||
"module": "api/database",
|
||||
"updates": {
|
||||
"common_gotchas": [
|
||||
"Never concatenate user input into SQL - use parameterized queries",
|
||||
"Test edge cases: null, undefined, [], '', invalid input"
|
||||
],
|
||||
"code_patterns": [
|
||||
"db.query(sql, [param1, param2]) ✓",
|
||||
"sql + userInput ✗"
|
||||
],
|
||||
"test_requirements": [
|
||||
"Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
|
||||
"Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
|
||||
],
|
||||
"related_stories": ["{{story_key}}"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Save to: `docs/sprint-artifacts/completions/{{story_key}}-reflection.json`
|
||||
|
||||
## Playbook Structure
|
||||
|
||||
When proposing playbook updates, structure them with these sections:
|
||||
|
||||
1. **Common Gotchas** - What mistakes to avoid
|
||||
2. **Code Patterns** - Standard approaches (with ✓ and ✗ examples)
|
||||
3. **Test Requirements** - What tests are essential
|
||||
4. **Related Stories** - Which stories used these patterns
|
||||
|
||||
Keep it simple and actionable for future agents.
|
||||
|
|
@ -6,7 +6,6 @@
|
|||
|
||||
<execution_context>
|
||||
@patterns/security-checklist.md
|
||||
@patterns/hospital-grade.md
|
||||
@patterns/agent-completion.md
|
||||
</execution_context>
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,73 @@
|
|||
# Test Quality Agent
|
||||
|
||||
You are the **TEST QUALITY** agent for story {{story_key}}.
|
||||
|
||||
## Context
|
||||
|
||||
- **Story:** {{story_file}}
|
||||
- **Builder completion:** {{builder_completion_artifact}}
|
||||
|
||||
## Objective
|
||||
|
||||
Review test files for quality and completeness:
|
||||
|
||||
1. Find all test files created/modified by Builder
|
||||
2. For each test file, verify:
|
||||
- **Happy path**: Primary functionality tested ✓
|
||||
- **Edge cases**: null, empty, invalid inputs ✓
|
||||
- **Error conditions**: Failures handled properly ✓
|
||||
- **Assertions**: Meaningful checks (not just "doesn't crash")
|
||||
- **Test names**: Descriptive and clear
|
||||
- **Deterministic**: No random data, no timing dependencies
|
||||
3. Check that tests actually validate the feature
|
||||
|
||||
**Focus on:** What's missing? What edge cases weren't considered?
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [ ] All test files reviewed
|
||||
- [ ] Edge cases identified (covered or missing)
|
||||
- [ ] Error conditions verified
|
||||
- [ ] Assertions are meaningful
|
||||
- [ ] Tests are deterministic
|
||||
- [ ] Return quality assessment
|
||||
|
||||
## Completion Format
|
||||
|
||||
Return structured JSON artifact:
|
||||
|
||||
```json
|
||||
{
|
||||
"agent": "test_quality",
|
||||
"story_key": "{{story_key}}",
|
||||
"verdict": "PASS" | "NEEDS_IMPROVEMENT",
|
||||
"test_files_reviewed": ["path/to/test.tsx", ...],
|
||||
"issues": [
|
||||
{
|
||||
"severity": "HIGH",
|
||||
"file": "path/to/test.tsx:45",
|
||||
"issue": "Missing edge case: empty input array",
|
||||
"recommendation": "Add test: expect(fn([])).toThrow(...)"
|
||||
},
|
||||
{
|
||||
"severity": "MEDIUM",
|
||||
"file": "path/to/test.tsx:67",
|
||||
"issue": "Test uses Math.random() - could be flaky",
|
||||
"recommendation": "Use fixed test data"
|
||||
}
|
||||
],
|
||||
"coverage_analysis": {
|
||||
"edge_cases_covered": true,
|
||||
"error_conditions_tested": true,
|
||||
"meaningful_assertions": true,
|
||||
"tests_are_deterministic": true
|
||||
},
|
||||
"summary": {
|
||||
"high_issues": 0,
|
||||
"medium_issues": 0,
|
||||
"low_issues": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Save to: `docs/sprint-artifacts/completions/{{story_key}}-test-quality.json`
|
||||
|
|
@ -1,74 +1,142 @@
|
|||
# Super-Dev-Pipeline v3.1 - Token-Efficient Multi-Agent Pipeline
|
||||
# Story-Full-Pipeline v4.0 - Enhanced Multi-Agent Pipeline
|
||||
|
||||
<purpose>
|
||||
Implement a story using parallel verification agents with Builder context reuse.
|
||||
Each agent has single responsibility. Builder fixes issues in its own context (50-70% token savings).
|
||||
Orchestrator handles bookkeeping (story file updates, verification).
|
||||
Enhanced with playbook learning, code citation evidence, test quality validation, and automated coverage gates.
|
||||
Builder fixes issues in its own context (50-70% token savings).
|
||||
</purpose>
|
||||
|
||||
<philosophy>
|
||||
**Token-Efficient Multi-Agent Pipeline**
|
||||
**Quality Through Discipline, Continuous Learning**
|
||||
|
||||
- Builder implements (creative, context preserved)
|
||||
- Inspector + Reviewers validate in parallel (verification, fresh context)
|
||||
- Builder fixes issues (creative, reuses context - 50-70% token savings)
|
||||
- Inspector re-checks (verification, quick check)
|
||||
- Orchestrator reconciles story file (mechanical)
|
||||
- Playbook Query: Load relevant patterns before starting
|
||||
- Builder: Implements with playbook knowledge (context preserved)
|
||||
- Inspector + Test Quality + Reviewers: Validate in parallel with proof
|
||||
- Coverage Gate: Automated threshold enforcement
|
||||
- Builder: Fixes issues in same context (50-70% token savings)
|
||||
- Inspector: Quick recheck
|
||||
- Orchestrator: Reconciles mechanically
|
||||
- Reflection: Updates playbooks for future agents
|
||||
|
||||
**Key Innovation:** Resume Builder instead of spawning fresh Fixer.
|
||||
Builder already knows the codebase - just needs to fix specific issues.
|
||||
|
||||
Trust but verify. Fresh context for verification. Reuse context for fixes.
|
||||
Trust but verify. Fresh context for verification. Evidence-based validation. Self-improving system.
|
||||
</philosophy>
|
||||
|
||||
<config>
|
||||
name: story-full-pipeline
|
||||
version: 3.2.0
|
||||
version: 4.0.0
|
||||
execution_mode: multi_agent
|
||||
|
||||
phases:
|
||||
phase_0: Playbook Query (orchestrator)
|
||||
phase_1: Builder (saves agent_id)
|
||||
phase_2: [Inspector + N Reviewers] in parallel (N = 2/3/4 based on complexity)
|
||||
phase_2: [Inspector + Test Quality + N Reviewers] in parallel
|
||||
phase_2.5: Coverage Gate (automated)
|
||||
phase_3: Resume Builder with all findings (reuses context)
|
||||
phase_4: Inspector re-check (quick verification)
|
||||
phase_5: Orchestrator reconciliation
|
||||
phase_6: Playbook Reflection
|
||||
|
||||
reviewer_counts:
|
||||
micro: 2 reviewers (security, architect/integration) v3.2.0+
|
||||
standard: 3 reviewers (security, logic/performance, architect/integration) v3.2.0+
|
||||
complex: 4 reviewers (security, logic, architect/integration, code quality) v3.2.0+
|
||||
micro: 2 reviewers (security, architect/integration)
|
||||
standard: 3 reviewers (security, logic/performance, architect/integration)
|
||||
complex: 4 reviewers (security, logic, architect/integration, code quality)
|
||||
|
||||
quality_gates:
|
||||
coverage_threshold: 80 # % line coverage required
|
||||
task_verification: "all_with_evidence" # Inspector must cite file:line
|
||||
critical_issues: "must_fix"
|
||||
high_issues: "must_fix"
|
||||
|
||||
token_efficiency:
|
||||
- Phase 2 agents spawn in parallel (same cost, faster)
|
||||
- Phase 3 resumes Builder (50-70% token savings vs fresh Fixer)
|
||||
- Phase 3 resumes Builder (50-70% token savings vs fresh agent)
|
||||
- Phase 4 Inspector only (no full re-review)
|
||||
|
||||
playbooks:
|
||||
enabled: true
|
||||
directory: "docs/playbooks/implementation-playbooks"
|
||||
max_load: 3
|
||||
auto_apply_updates: false
|
||||
</config>
|
||||
|
||||
<execution_context>
|
||||
@patterns/hospital-grade.md
|
||||
@patterns/verification.md
|
||||
@patterns/tdd.md
|
||||
@patterns/agent-completion.md
|
||||
</execution_context>
|
||||
|
||||
<process>
|
||||
|
||||
<step name="load_story" priority="first">
|
||||
Load and validate the story file.
|
||||
**Load and parse story file**
|
||||
|
||||
\`\`\`bash
|
||||
STORY_FILE="docs/sprint-artifacts/{{story_key}}.md"
|
||||
[ -f "$STORY_FILE" ] || { echo "ERROR: Story file not found"; exit 1; }
|
||||
\`\`\`
|
||||
|
||||
Use Read tool on the story file. Parse:
|
||||
- Complexity level (micro/standard/complex)
|
||||
Use Read tool. Extract:
|
||||
- Task count
|
||||
- Acceptance criteria count
|
||||
- Keywords for risk scoring
|
||||
|
||||
Determine which agents to spawn based on complexity routing.
|
||||
**Determine complexity:**
|
||||
\`\`\`bash
|
||||
TASK_COUNT=$(grep -c "^- \[ \]" "$STORY_FILE")
|
||||
RISK_KEYWORDS=$(grep -ciE "auth|security|payment|encryption|migration|database" "$STORY_FILE")
|
||||
|
||||
if [ "$TASK_COUNT" -le 3 ] && [ "$RISK_KEYWORDS" -eq 0 ]; then
|
||||
COMPLEXITY="micro"
|
||||
REVIEWER_COUNT=2
|
||||
elif [ "$TASK_COUNT" -ge 16 ] || [ "$RISK_KEYWORDS" -gt 0 ]; then
|
||||
COMPLEXITY="complex"
|
||||
REVIEWER_COUNT=4
|
||||
else
|
||||
COMPLEXITY="standard"
|
||||
REVIEWER_COUNT=3
|
||||
fi
|
||||
\`\`\`
|
||||
|
||||
Determine agents to spawn: Inspector + Test Quality + $REVIEWER_COUNT Reviewers
|
||||
</step>
|
||||
|
||||
<step name="query_playbooks">
|
||||
**Phase 0: Playbook Query**
|
||||
|
||||
\`\`\`
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
📚 PHASE 0: PLAYBOOK QUERY
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
\`\`\`
|
||||
|
||||
**Extract story keywords:**
|
||||
\`\`\`bash
|
||||
STORY_KEYWORDS=$(grep -E "^## Story Title|^### Feature|^## Business Context" "$STORY_FILE" | sed 's/[#]//g' | tr '\n' ' ')
|
||||
echo "Story keywords: $STORY_KEYWORDS"
|
||||
\`\`\`
|
||||
|
||||
**Search for relevant playbooks:**
|
||||
Use Grep tool:
|
||||
- Pattern: extracted keywords
|
||||
- Path: \`docs/playbooks/implementation-playbooks/\`
|
||||
- Output mode: files_with_matches
|
||||
- Limit: 3 files
|
||||
|
||||
**Load matching playbooks:**
|
||||
For each playbook found:
|
||||
- Use Read tool
|
||||
- Extract sections: Common Gotchas, Code Patterns, Test Requirements
|
||||
|
||||
If no playbooks exist:
|
||||
\`\`\`
|
||||
ℹ️ No playbooks found - this will be the first story to create them
|
||||
\`\`\`
|
||||
|
||||
Store playbook content for Builder.
|
||||
</step>
|
||||
|
||||
<step name="spawn_builder">
|
||||
**Phase 1: Builder Agent (Steps 1-4)**
|
||||
**Phase 1: Builder Agent**
|
||||
|
||||
\`\`\`
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
|
@ -76,41 +144,359 @@ Determine which agents to spawn based on complexity routing.
|
|||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
\`\`\`
|
||||
|
||||
Spawn Builder agent and save agent_id for later resume.
|
||||
|
||||
**CRITICAL: Save Builder's agent_id for later resume**
|
||||
Spawn Builder agent and **SAVE agent_id for resume later**:
|
||||
|
||||
\`\`\`
|
||||
BUILDER_AGENT_ID={{agent_id_from_task_result}}
|
||||
echo "Builder agent: $BUILDER_AGENT_ID"
|
||||
BUILDER_TASK = Task({
|
||||
subagent_type: "general-purpose",
|
||||
description: "Implement story {{story_key}}",
|
||||
prompt: \`
|
||||
You are the BUILDER agent for story {{story_key}}.
|
||||
|
||||
<execution_context>
|
||||
@patterns/tdd.md
|
||||
@patterns/agent-completion.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
Story: [inline story file content]
|
||||
|
||||
{{IF playbooks loaded}}
|
||||
Relevant Playbooks (review before implementing):
|
||||
[inline playbook content]
|
||||
|
||||
Pay special attention to:
|
||||
- Common Gotchas in these playbooks
|
||||
- Code Patterns to follow
|
||||
- Test Requirements to satisfy
|
||||
{{ENDIF}}
|
||||
</context>
|
||||
|
||||
<objective>
|
||||
Implement the story requirements:
|
||||
1. Review story tasks and acceptance criteria
|
||||
2. **Review playbooks** for gotchas and patterns (if provided)
|
||||
3. Analyze what exists vs needed (gap analysis)
|
||||
4. **Write tests FIRST** (TDD - tests before implementation)
|
||||
5. Implement production code to pass tests
|
||||
</objective>
|
||||
|
||||
<constraints>
|
||||
- DO NOT validate your own work
|
||||
- DO NOT review your code
|
||||
- DO NOT update story checkboxes
|
||||
- DO NOT commit changes yet
|
||||
</constraints>
|
||||
|
||||
<success_criteria>
|
||||
- [ ] Reviewed playbooks for guidance
|
||||
- [ ] Tests written for all requirements
|
||||
- [ ] Production code implements tests
|
||||
- [ ] Tests pass
|
||||
- [ ] Return structured completion artifact
|
||||
</success_criteria>
|
||||
|
||||
<completion_format>
|
||||
Return structured JSON artifact:
|
||||
{
|
||||
"agent": "builder",
|
||||
"story_key": "{{story_key}}",
|
||||
"status": "SUCCESS" | "FAILED",
|
||||
"files_created": ["path/to/file.tsx", ...],
|
||||
"files_modified": ["path/to/file.tsx", ...],
|
||||
"tests_added": {
|
||||
"total": 12,
|
||||
"passing": 12
|
||||
},
|
||||
"tasks_addressed": ["task description from story", ...]
|
||||
}
|
||||
|
||||
Save to: docs/sprint-artifacts/completions/{{story_key}}-builder.json
|
||||
</completion_format>
|
||||
\`
|
||||
})
|
||||
|
||||
BUILDER_AGENT_ID = {{extract agent_id from Task result}}
|
||||
\`\`\`
|
||||
|
||||
Wait for completion. Parse structured output. Verify files exist.
|
||||
**CRITICAL: Store Builder agent ID:**
|
||||
\`\`\`bash
|
||||
echo "Builder agent ID: $BUILDER_AGENT_ID"
|
||||
echo "$BUILDER_AGENT_ID" > /tmp/builder-agent-id.txt
|
||||
\`\`\`
|
||||
|
||||
**Wait for completion. Verify artifact exists:**
|
||||
\`\`\`bash
|
||||
BUILDER_COMPLETION="docs/sprint-artifacts/completions/{{story_key}}-builder.json"
|
||||
[ -f "$BUILDER_COMPLETION" ] || { echo "❌ No builder artifact"; exit 1; }
|
||||
\`\`\`
|
||||
|
||||
**Verify files exist:**
|
||||
\`\`\`bash
|
||||
# For each file in files_created and files_modified:
|
||||
[ -f "$file" ] || echo "❌ MISSING: $file"
|
||||
\`\`\`
|
||||
|
||||
If files missing or status FAILED: halt pipeline.
|
||||
</step>
|
||||
|
||||
<step name="spawn_verification_parallel">
|
||||
**Phase 2: Parallel Verification (Inspector + Reviewers)**
|
||||
**Phase 2: Parallel Verification (Inspector + Test Quality + Reviewers)**
|
||||
|
||||
\`\`\`
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
🔍 PHASE 2: PARALLEL VERIFICATION
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
Spawning: Inspector + Test Quality + {{REVIEWER_COUNT}} Reviewers
|
||||
Total agents: {{2 + REVIEWER_COUNT}}
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
\`\`\`
|
||||
|
||||
**CRITICAL: Spawn ALL verification agents in ONE message (parallel execution)**
|
||||
**CRITICAL: Spawn ALL agents in ONE message (parallel execution)**
|
||||
|
||||
Send single message with multiple Task calls:
|
||||
1. Inspector Agent
|
||||
2. Test Quality Agent
|
||||
3. Security Reviewer
|
||||
4. Logic/Performance Reviewer (if standard/complex)
|
||||
5. Architect/Integration Reviewer
|
||||
6. Code Quality Reviewer (if complex)
|
||||
|
||||
---
|
||||
|
||||
## Inspector Agent Prompt:
|
||||
|
||||
Determine reviewer count based on complexity:
|
||||
\`\`\`
|
||||
if complexity == "micro": REVIEWER_COUNT = 1
|
||||
if complexity == "standard": REVIEWER_COUNT = 2
|
||||
if complexity == "complex": REVIEWER_COUNT = 3
|
||||
Task({
|
||||
subagent_type: "general-purpose",
|
||||
description: "Validate story {{story_key}} implementation",
|
||||
prompt: \`
|
||||
You are the INSPECTOR agent for story {{story_key}}.
|
||||
|
||||
<execution_context>
|
||||
@patterns/verification.md
|
||||
@patterns/agent-completion.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
Story: [inline story file content]
|
||||
</context>
|
||||
|
||||
<objective>
|
||||
Independently verify implementation WITH CODE CITATIONS:
|
||||
|
||||
1. Read story file - understand ALL tasks
|
||||
2. Read each file Builder created/modified
|
||||
3. **Map EACH task to specific code with file:line citations**
|
||||
4. Run verification checks:
|
||||
- Type-check (0 errors required)
|
||||
- Lint (0 warnings required)
|
||||
- Tests (all passing required)
|
||||
- Build (success required)
|
||||
</objective>
|
||||
|
||||
<critical_requirement>
|
||||
**EVERY task must have evidence.**
|
||||
|
||||
For each task, provide:
|
||||
- file:line where it's implemented
|
||||
- Brief quote of relevant code
|
||||
- Verdict: IMPLEMENTED or NOT_IMPLEMENTED
|
||||
|
||||
Example:
|
||||
Task: "Display occupant agreement status"
|
||||
Evidence: src/features/agreement/StatusBadge.tsx:45-67
|
||||
Code: "const StatusBadge = ({ status }) => ..."
|
||||
Verdict: IMPLEMENTED
|
||||
</critical_requirement>
|
||||
|
||||
<constraints>
|
||||
- You have NO KNOWLEDGE of what Builder did
|
||||
- Run all checks yourself - don't trust claims
|
||||
- **Every task needs file:line citation**
|
||||
- If code doesn't exist: mark NOT IMPLEMENTED with reason
|
||||
</constraints>
|
||||
|
||||
<success_criteria>
|
||||
- [ ] ALL tasks mapped to code locations
|
||||
- [ ] Type check: 0 errors
|
||||
- [ ] Lint: 0 warnings
|
||||
- [ ] Tests: all passing
|
||||
- [ ] Build: success
|
||||
- [ ] Return structured evidence
|
||||
</success_criteria>
|
||||
|
||||
<completion_format>
|
||||
{
|
||||
"agent": "inspector",
|
||||
"story_key": "{{story_key}}",
|
||||
"verdict": "PASS" | "FAIL",
|
||||
"task_verification": [
|
||||
{
|
||||
"task": "Create agreement view component",
|
||||
"implemented": true,
|
||||
"evidence": [
|
||||
{
|
||||
"file": "src/features/agreement/AgreementView.tsx",
|
||||
"lines": "15-67",
|
||||
"code_snippet": "export const AgreementView = ({ agreementId }) => {...}"
|
||||
},
|
||||
{
|
||||
"file": "src/features/agreement/AgreementView.test.tsx",
|
||||
"lines": "8-45",
|
||||
"code_snippet": "describe('AgreementView', () => {...})"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"task": "Add status badge",
|
||||
"implemented": false,
|
||||
"evidence": [],
|
||||
"reason": "No StatusBadge component found in src/features/agreement/"
|
||||
}
|
||||
],
|
||||
"checks": {
|
||||
"type_check": {"passed": true, "errors": 0},
|
||||
"lint": {"passed": true, "warnings": 0},
|
||||
"tests": {"passed": true, "total": 12, "passing": 12},
|
||||
"build": {"passed": true}
|
||||
}
|
||||
}
|
||||
|
||||
Save to: docs/sprint-artifacts/completions/{{story_key}}-inspector.json
|
||||
</completion_format>
|
||||
\`
|
||||
})
|
||||
\`\`\`
|
||||
|
||||
Spawn Inspector + N Reviewers in single message. Wait for ALL agents to complete. Collect findings.
|
||||
---
|
||||
|
||||
Aggregate all findings from Inspector + Reviewers.
|
||||
## Test Quality Agent Prompt:
|
||||
|
||||
\`\`\`
|
||||
Task({
|
||||
subagent_type: "general-purpose",
|
||||
description: "Review test quality for {{story_key}}",
|
||||
prompt: \`
|
||||
You are the TEST QUALITY agent for story {{story_key}}.
|
||||
|
||||
<context>
|
||||
Story: [inline story file content]
|
||||
Builder completion: [inline builder artifact]
|
||||
</context>
|
||||
|
||||
<objective>
|
||||
Review test files for quality and completeness:
|
||||
|
||||
1. Find all test files created/modified by Builder
|
||||
2. For each test file, verify:
|
||||
- **Happy path**: Primary functionality tested ✓
|
||||
- **Edge cases**: null, empty, invalid inputs ✓
|
||||
- **Error conditions**: Failures handled properly ✓
|
||||
- **Assertions**: Meaningful checks (not just "doesn't crash")
|
||||
- **Test names**: Descriptive and clear
|
||||
- **Deterministic**: No random data, no timing dependencies
|
||||
3. Check that tests actually validate the feature
|
||||
|
||||
**Focus on:** What's missing? What edge cases weren't considered?
|
||||
</objective>
|
||||
|
||||
<success_criteria>
|
||||
- [ ] All test files reviewed
|
||||
- [ ] Edge cases identified (covered or missing)
|
||||
- [ ] Error conditions verified
|
||||
- [ ] Assertions are meaningful
|
||||
- [ ] Tests are deterministic
|
||||
- [ ] Return quality assessment
|
||||
</success_criteria>
|
||||
|
||||
<completion_format>
|
||||
{
|
||||
"agent": "test_quality",
|
||||
"story_key": "{{story_key}}",
|
||||
"verdict": "PASS" | "NEEDS_IMPROVEMENT",
|
||||
"test_files_reviewed": ["path/to/test.tsx", ...],
|
||||
"issues": [
|
||||
{
|
||||
"severity": "HIGH",
|
||||
"file": "path/to/test.tsx:45",
|
||||
"issue": "Missing edge case: empty input array",
|
||||
"recommendation": "Add test: expect(fn([])).toThrow(...)"
|
||||
},
|
||||
{
|
||||
"severity": "MEDIUM",
|
||||
"file": "path/to/test.tsx:67",
|
||||
"issue": "Test uses Math.random() - could be flaky",
|
||||
"recommendation": "Use fixed test data"
|
||||
}
|
||||
],
|
||||
"coverage_analysis": {
|
||||
"edge_cases_covered": true | false,
|
||||
"error_conditions_tested": true | false,
|
||||
"meaningful_assertions": true | false,
|
||||
"tests_are_deterministic": true | false
|
||||
},
|
||||
"summary": {
|
||||
"high_issues": 1,
|
||||
"medium_issues": 2,
|
||||
"low_issues": 0
|
||||
}
|
||||
}
|
||||
|
||||
Save to: docs/sprint-artifacts/completions/{{story_key}}-test-quality.json
|
||||
</completion_format>
|
||||
\`
|
||||
})
|
||||
\`\`\`
|
||||
|
||||
---
|
||||
|
||||
(Continue with Security, Logic, Architect, Quality reviewers as before...)
|
||||
|
||||
**Wait for ALL agents to complete.**
|
||||
|
||||
Collect completion artifacts:
|
||||
- \`inspector.json\`
|
||||
- \`test-quality.json\`
|
||||
- \`reviewer-security.json\`
|
||||
- \`reviewer-logic.json\` (if spawned)
|
||||
- \`reviewer-architect.json\`
|
||||
- \`reviewer-quality.json\` (if spawned)
|
||||
|
||||
Parse all findings and aggregate by severity.
|
||||
</step>
|
||||
|
||||
<step name="coverage_gate">
|
||||
**Phase 2.5: Coverage Gate (Automated)**
|
||||
|
||||
\`\`\`
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
📊 PHASE 2.5: COVERAGE GATE
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
\`\`\`
|
||||
|
||||
Run coverage check:
|
||||
\`\`\`bash
|
||||
# Run tests with coverage
|
||||
npm test -- --coverage --silent 2>&1 | tee coverage-output.txt
|
||||
|
||||
# Extract coverage percentage (adjust grep pattern for your test framework)
|
||||
COVERAGE=$(grep -E "All files|Statements" coverage-output.txt | head -1 | grep -oE "[0-9]+\.[0-9]+|[0-9]+" | head -1 || echo "0")
|
||||
|
||||
echo "Coverage: ${COVERAGE}%"
|
||||
echo "Threshold: {{coverage_threshold}}%"
|
||||
|
||||
# Compare coverage
|
||||
if (( $(echo "$COVERAGE < {{coverage_threshold}}" | bc -l) )); then
|
||||
echo "❌ Coverage ${COVERAGE}% below threshold {{coverage_threshold}}%"
|
||||
echo "Builder must add more tests before proceeding"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ Coverage gate passed: ${COVERAGE}%"
|
||||
\`\`\`
|
||||
|
||||
If coverage fails: add to issues list for Builder to fix.
|
||||
</step>
|
||||
|
||||
<step name="resume_builder_with_findings">
|
||||
|
|
@ -156,68 +542,274 @@ If PASS: Proceed to reconciliation.
|
|||
|
||||
\`\`\`
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
🔧 PHASE 5: RECONCILIATION (Orchestrator)
|
||||
📊 PHASE 5: RECONCILIATION (Orchestrator)
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
\`\`\`
|
||||
|
||||
**YOU (orchestrator) do this directly. No agent spawn.**
|
||||
|
||||
1. Get what was built (git log, git diff)
|
||||
2. Read story file
|
||||
3. Check off completed tasks (Edit tool)
|
||||
4. Fill Dev Agent Record with pipeline details
|
||||
5. Verify updates (grep task checkboxes)
|
||||
6. Update sprint-status.yaml to "done"
|
||||
**5.1: Load completion artifacts**
|
||||
\`\`\`bash
|
||||
BUILDER_FIXES="docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json"
|
||||
INSPECTOR="docs/sprint-artifacts/completions/{{story_key}}-inspector.json"
|
||||
\`\`\`
|
||||
|
||||
Use Read tool on all artifacts.
|
||||
|
||||
**5.2: Read story file**
|
||||
Use Read tool: \`docs/sprint-artifacts/{{story_key}}.md\`
|
||||
|
||||
**5.3: Check off completed tasks using Inspector evidence**
|
||||
|
||||
For each task in \`inspector.task_verification\`:
|
||||
- If \`implemented: true\` and has evidence:
|
||||
- Use Edit tool: \`"- [ ] {{task}}"\` → \`"- [x] {{task}}"\`
|
||||
|
||||
**5.4: Fill Dev Agent Record with evidence**
|
||||
|
||||
Use Edit tool:
|
||||
\`\`\`markdown
|
||||
### Dev Agent Record
|
||||
**Implementation Date:** {{timestamp}}
|
||||
**Agent Model:** Claude Sonnet 4.5 (multi-agent pipeline v4.0)
|
||||
**Git Commit:** {{git_commit}}
|
||||
|
||||
**Pipeline Phases:**
|
||||
- Phase 0: Playbook Query ({{playbooks_loaded}} loaded)
|
||||
- Phase 1: Builder (initial implementation)
|
||||
- Phase 2: Parallel Verification
|
||||
- Inspector: {{verdict}} with code citations
|
||||
- Test Quality: {{verdict}}
|
||||
- {{REVIEWER_COUNT}} Reviewers: {{issues_found}}
|
||||
- Phase 2.5: Coverage Gate ({{coverage}}%)
|
||||
- Phase 3: Builder (resumed, fixed {{fixes_count}} issues)
|
||||
- Phase 4: Inspector re-check ({{verdict}})
|
||||
|
||||
**Files Created:** {{count}}
|
||||
**Files Modified:** {{count}}
|
||||
**Tests:** {{tests.passing}}/{{tests.total}} passing ({{coverage}}%)
|
||||
**Issues Fixed:** {{critical}} CRITICAL, {{high}} HIGH, {{medium}} MEDIUM
|
||||
|
||||
**Task Evidence:** (Inspector code citations)
|
||||
{{for each task with evidence}}
|
||||
- [x] {{task}}
|
||||
- {{evidence[0].file}}:{{evidence[0].lines}}
|
||||
{{endfor}}
|
||||
\`\`\`
|
||||
|
||||
**5.5: Verify updates**
|
||||
\`\`\`bash
|
||||
CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
|
||||
[ "$CHECKED" -gt 0 ] || { echo "❌ Zero tasks checked"; exit 1; }
|
||||
echo "✅ Reconciled: $CHECKED tasks with evidence"
|
||||
\`\`\`
|
||||
</step>
|
||||
|
||||
<step name="final_verification">
|
||||
**Final Quality Gate**
|
||||
|
||||
Verify:
|
||||
1. Git commit exists
|
||||
2. Story tasks checked (count > 0)
|
||||
3. Dev Agent Record filled
|
||||
4. Sprint status updated
|
||||
\`\`\`bash
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
echo "🔍 FINAL VERIFICATION"
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
|
||||
If verification fails: fix using Edit, then re-verify.
|
||||
# 1. Git commit exists
|
||||
git log --oneline -3 | grep "{{story_key}}" || { echo "❌ No commit"; exit 1; }
|
||||
echo "✅ Git commit found"
|
||||
|
||||
# 2. Story tasks checked with evidence
|
||||
CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md)
|
||||
[ "$CHECKED" -gt 0 ] || { echo "❌ No tasks checked"; exit 1; }
|
||||
echo "✅ $CHECKED tasks checked with code citations"
|
||||
|
||||
# 3. Dev Agent Record filled
|
||||
grep -A 5 "### Dev Agent Record" docs/sprint-artifacts/{{story_key}}.md | grep -q "202" || { echo "❌ Record not filled"; exit 1; }
|
||||
echo "✅ Dev Agent Record filled"
|
||||
|
||||
# 4. Coverage met threshold
|
||||
FINAL_COVERAGE=$(jq -r '.tests.coverage' docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json)
|
||||
if (( $(echo "$FINAL_COVERAGE < {{coverage_threshold}}" | bc -l) )); then
|
||||
echo "❌ Coverage ${FINAL_COVERAGE}% still below threshold"
|
||||
exit 1
|
||||
fi
|
||||
echo "✅ Coverage: ${FINAL_COVERAGE}%"
|
||||
|
||||
echo ""
|
||||
echo "✅ STORY COMPLETE"
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
\`\`\`
|
||||
|
||||
**Update sprint-status.yaml:**
|
||||
Use Edit tool: \`"{{story_key}}: ready-for-dev"\` → \`"{{story_key}}: done"\`
|
||||
</step>
|
||||
|
||||
<step name="playbook_reflection">
|
||||
**Phase 6: Playbook Reflection**
|
||||
|
||||
\`\`\`
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
💡 PHASE 6: PLAYBOOK REFLECTION
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
\`\`\`
|
||||
|
||||
Spawn Reflection Agent:
|
||||
|
||||
\`\`\`
|
||||
Task({
|
||||
subagent_type: "general-purpose",
|
||||
description: "Extract learnings from {{story_key}}",
|
||||
prompt: \`
|
||||
You are the REFLECTION agent for story {{story_key}}.
|
||||
|
||||
<context>
|
||||
Story: [inline story file]
|
||||
Builder initial: [inline builder.json]
|
||||
All review findings: [inline all reviewer artifacts]
|
||||
Builder fixes: [inline builder-fixes.json]
|
||||
Test quality issues: [inline test-quality.json]
|
||||
</context>
|
||||
|
||||
<objective>
|
||||
Identify what future agents should know:
|
||||
|
||||
1. **What issues were found?** (from reviewers)
|
||||
2. **What did Builder miss initially?** (gaps, edge cases, security)
|
||||
3. **What playbook knowledge would have prevented these?**
|
||||
4. **Which module/feature area does this apply to?**
|
||||
5. **Should we update existing playbook or create new?**
|
||||
|
||||
Questions:
|
||||
- What gotchas should future builders know?
|
||||
- What code patterns should be standard?
|
||||
- What test requirements are essential?
|
||||
- What similar stories exist?
|
||||
</objective>
|
||||
|
||||
<success_criteria>
|
||||
- [ ] Analyzed review findings
|
||||
- [ ] Identified preventable issues
|
||||
- [ ] Determined which playbook(s) to update
|
||||
- [ ] Return structured proposal
|
||||
</success_criteria>
|
||||
|
||||
<completion_format>
|
||||
{
|
||||
"agent": "reflection",
|
||||
"story_key": "{{story_key}}",
|
||||
"learnings": [
|
||||
{
|
||||
"issue": "SQL injection in query builder",
|
||||
"root_cause": "Builder used string concatenation (didn't know pattern)",
|
||||
"prevention": "Playbook should document: always use parameterized queries",
|
||||
"applies_to": "database queries, API endpoints with user input"
|
||||
},
|
||||
{
|
||||
"issue": "Missing edge case tests for empty arrays",
|
||||
"root_cause": "Test Quality Agent found gap",
|
||||
"prevention": "Playbook should require: test null/empty/invalid for all inputs",
|
||||
"applies_to": "all data processing functions"
|
||||
}
|
||||
],
|
||||
"playbook_proposal": {
|
||||
"action": "update_existing" | "create_new",
|
||||
"playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md",
|
||||
"module": "api/database",
|
||||
"updates": {
|
||||
"common_gotchas": [
|
||||
"Never concatenate user input into SQL - use parameterized queries",
|
||||
"Test edge cases: null, undefined, [], '', invalid input"
|
||||
],
|
||||
"code_patterns": [
|
||||
"db.query(sql, [param1, param2]) ✓",
|
||||
"sql + userInput ✗"
|
||||
],
|
||||
"test_requirements": [
|
||||
"Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()",
|
||||
"Test empty inputs: expect(fn([])).toHandle() or .toThrow()"
|
||||
],
|
||||
"related_stories": ["{{story_key}}"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Save to: docs/sprint-artifacts/completions/{{story_key}}-reflection.json
|
||||
</completion_format>
|
||||
\`
|
||||
})
|
||||
\`\`\`
|
||||
|
||||
**Wait for completion.**
|
||||
|
||||
**Review playbook proposal:**
|
||||
\`\`\`bash
|
||||
REFLECTION="docs/sprint-artifacts/completions/{{story_key}}-reflection.json"
|
||||
ACTION=$(jq -r '.playbook_proposal.action' "$REFLECTION")
|
||||
PLAYBOOK=$(jq -r '.playbook_proposal.playbook' "$REFLECTION")
|
||||
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
echo "📝 Playbook Update Proposal"
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
echo "Action: $ACTION"
|
||||
echo "Playbook: $PLAYBOOK"
|
||||
echo ""
|
||||
jq -r '.learnings[] | "- \(.issue)\n Prevention: \(.prevention)"' "$REFLECTION"
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
\`\`\`
|
||||
|
||||
If \`auto_apply_updates: true\` in config:
|
||||
- Read playbook (or create from template if new)
|
||||
- Use Edit tool to add learnings to sections
|
||||
- Commit playbook update
|
||||
|
||||
If \`auto_apply_updates: false\` (default):
|
||||
- Display proposal for manual review
|
||||
- User can apply later with \`/update-playbooks {{story_key}}\`
|
||||
</step>
|
||||
|
||||
</process>
|
||||
|
||||
<failure_handling>
|
||||
**Builder fails:** Don't spawn verification. Report failure and halt.
|
||||
**Inspector fails (Phase 2):** Still run Reviewers in parallel, collect all findings together.
|
||||
**Inspector fails (Phase 4):** Resume Builder again with new issues (iterative fix loop).
|
||||
**Builder resume fails:** Report unfixed issues. Manual intervention needed.
|
||||
**Reconciliation fails:** Fix using Edit tool. Re-verify checkboxes.
|
||||
**Builder fails (Phase 1):** Don't spawn verification. Report failure and halt.
|
||||
**Inspector fails (Phase 2):** Still collect other reviewer findings.
|
||||
**Test Quality fails:** Add issues to Builder fix list.
|
||||
**Coverage below threshold:** Add to Builder fix list.
|
||||
**Reviewers find CRITICAL:** Builder MUST fix when resumed.
|
||||
**Inspector fails (Phase 4):** Resume Builder again (iterative loop, max 3 iterations).
|
||||
**Builder resume fails:** Report unfixed issues. Manual intervention.
|
||||
**Reconciliation fails:** Fix with Edit tool, re-verify.
|
||||
</failure_handling>
|
||||
|
||||
<complexity_routing>
|
||||
| Complexity | Pipeline | Reviewers | Total Phase 2 Agents |
|
||||
|------------|----------|-----------|---------------------|
|
||||
| micro | Builder → [Inspector + 2 Reviewers] → Resume Builder → Inspector recheck | 2 (security, architect) | 3 agents |
|
||||
| standard | Builder → [Inspector + 3 Reviewers] → Resume Builder → Inspector recheck | 3 (security, logic, architect) | 4 agents |
|
||||
| complex | Builder → [Inspector + 4 Reviewers] → Resume Builder → Inspector recheck | 4 (security, logic, architect, quality) | 5 agents |
|
||||
| Complexity | Phase 2 Agents | Total | Security |
|
||||
|------------|----------------|-------|----------|
|
||||
| micro | Inspector + Test Quality + 2 Reviewers | 4 agents | Security Reviewer + Architect |
|
||||
| standard | Inspector + Test Quality + 3 Reviewers | 5 agents | Security + Logic + Architect |
|
||||
| complex | Inspector + Test Quality + 4 Reviewers | 6 agents | Security + Logic + Architect + Quality |
|
||||
|
||||
**Key Improvements (v3.2.0):**
|
||||
- All verification agents spawn in parallel (single message, faster execution)
|
||||
- Builder resume in Phase 3 saves 50-70% tokens vs spawning fresh Fixer
|
||||
- **NEW:** Architect/Integration Reviewer catches runtime issues (404s, pattern violations, missing migrations)
|
||||
|
||||
**Reviewer Specializations:**
|
||||
- **Security:** Auth, injection, secrets, cross-tenant access
|
||||
- **Logic/Performance:** Bugs, edge cases, N+1 queries, race conditions
|
||||
- **Architect/Integration:** Routes work, patterns match, migrations applied, dependencies installed (v3.2.0+)
|
||||
- **Code Quality:** Maintainability, naming, duplication (complex only)
|
||||
**All verification agents spawn in parallel (single message)**
|
||||
</complexity_routing>
|
||||
|
||||
<success_criteria>
|
||||
- [ ] Builder spawned and agent_id saved
|
||||
- [ ] All verification agents completed in parallel
|
||||
- [ ] Builder resumed with consolidated findings
|
||||
- [ ] Inspector recheck passed
|
||||
- [ ] Git commit exists for story
|
||||
- [ ] Story file has checked tasks (count > 0)
|
||||
- [ ] Dev Agent Record filled with all phases
|
||||
- [ ] Sprint status updated to "done"
|
||||
- [ ] Phase 0: Playbooks loaded (if available)
|
||||
- [ ] Phase 1: Builder spawned, agent_id saved
|
||||
- [ ] Phase 2: All verification agents completed in parallel
|
||||
- [ ] Phase 2.5: Coverage gate passed
|
||||
- [ ] Phase 3: Builder resumed with consolidated findings
|
||||
- [ ] Phase 4: Inspector recheck passed
|
||||
- [ ] Phase 5: Orchestrator reconciled with Inspector evidence
|
||||
- [ ] Phase 6: Playbook reflection completed
|
||||
- [ ] Git commit exists
|
||||
- [ ] Story tasks checked with code citations
|
||||
- [ ] Dev Agent Record filled
|
||||
- [ ] Coverage ≥ {{coverage_threshold}}%
|
||||
- [ ] Sprint status: done
|
||||
</success_criteria>
|
||||
|
||||
<improvements_v4>
|
||||
1. ✅ Resume Builder for fixes (v3.2+) - 50-70% token savings
|
||||
2. ✅ Inspector provides code citations (v4.0) - file:line evidence for every task
|
||||
3. ✅ Removed "hospital-grade" framing (v4.0) - kept disciplined gates
|
||||
4. ✅ Micro stories get 2 reviewers + security scan (v3.2+) - not zero
|
||||
5. ✅ Test Quality Agent (v4.0) + Coverage Gate (v4.0) - validates test quality and enforces threshold
|
||||
6. ✅ Playbook query (v4.0) before Builder + reflection (v4.0) after - continuous learning
|
||||
</improvements_v4>
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
name: story-full-pipeline
|
||||
description: "Multi-agent pipeline with wave-based execution, independent validation, and adversarial code review (GSDMAD)"
|
||||
author: "BMAD Method + GSD"
|
||||
version: "3.2.0" # Added architect-integration-reviewer for runtime verification
|
||||
description: "Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes"
|
||||
author: "BMAD Method"
|
||||
version: "4.0.0" # Added playbook learning, test quality, coverage gates, Inspector code citations
|
||||
|
||||
# Execution mode
|
||||
execution_mode: "multi_agent" # multi_agent | single_agent (fallback)
|
||||
|
|
@ -37,13 +37,23 @@ agents:
|
|||
timeout: 3600 # 1 hour
|
||||
|
||||
inspector:
|
||||
description: "Validation agent - independent verification"
|
||||
description: "Validation agent - independent verification with code citations"
|
||||
steps: [5, 6]
|
||||
subagent_type: "general-purpose"
|
||||
prompt_file: "{agents_path}/inspector.md"
|
||||
fresh_context: true # No knowledge of builder agent
|
||||
trust_level: "medium" # No conflict of interest
|
||||
timeout: 1800 # 30 minutes
|
||||
require_code_citations: true # v4.0: Must provide file:line evidence for all tasks
|
||||
|
||||
test_quality:
|
||||
description: "Test quality validation - verifies test coverage and quality"
|
||||
steps: [5.5]
|
||||
subagent_type: "general-purpose"
|
||||
prompt_file: "{agents_path}/test-quality.md"
|
||||
fresh_context: true
|
||||
trust_level: "medium"
|
||||
timeout: 1200 # 20 minutes
|
||||
|
||||
reviewer:
|
||||
description: "Adversarial code review - finds problems"
|
||||
|
|
@ -73,15 +83,40 @@ agents:
|
|||
trust_level: "medium" # Incentive to minimize work
|
||||
timeout: 2400 # 40 minutes
|
||||
|
||||
reflection:
|
||||
description: "Playbook learning - extracts patterns for future agents"
|
||||
steps: [10]
|
||||
subagent_type: "general-purpose"
|
||||
prompt_file: "{agents_path}/reflection.md"
|
||||
timeout: 900 # 15 minutes
|
||||
|
||||
# Reconciliation: orchestrator does this directly (see workflow.md Phase 5)
|
||||
|
||||
# Playbook configuration (v4.0)
|
||||
playbooks:
|
||||
enabled: true # Set to false in project config to disable
|
||||
directory: "docs/playbooks/implementation-playbooks"
|
||||
bootstrap_mode: true # Auto-initialize if missing
|
||||
max_load: 3
|
||||
auto_apply_updates: false # Require manual review of playbook updates
|
||||
discovery:
|
||||
enabled: true # Scan git/docs to populate initial playbooks
|
||||
sources: ["git_history", "docs", "existing_code"]
|
||||
|
||||
# Quality gates (v4.0)
|
||||
quality_gates:
|
||||
coverage_threshold: 80 # % line coverage required
|
||||
task_verification: "all_with_evidence" # Inspector must provide file:line citations
|
||||
critical_issues: "must_fix"
|
||||
high_issues: "must_fix"
|
||||
|
||||
# Complexity level (determines which steps to execute)
|
||||
complexity_level: "standard" # micro | standard | complex
|
||||
|
||||
# Complexity routing
|
||||
complexity_routing:
|
||||
micro:
|
||||
skip_agents: ["reviewer"] # Skip code review for micro stories
|
||||
skip_agents: [] # Full pipeline (v4.0: micro gets security scan)
|
||||
description: "Lightweight path for low-risk stories"
|
||||
examples: ["UI tweaks", "text changes", "simple CRUD"]
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,85 @@
|
|||
# {{Module/Feature Area}} - Implementation Playbook
|
||||
|
||||
> **Purpose:** Guide future agents implementing features in {{module_name}}
|
||||
> **Created:** {{date}}
|
||||
> **Last Updated:** {{date}}
|
||||
|
||||
## Common Gotchas
|
||||
|
||||
**What mistakes to avoid:**
|
||||
|
||||
- Add specific gotchas here as they're discovered
|
||||
- Example: "Never concatenate user input into SQL queries"
|
||||
- Example: "Always validate file paths before operations"
|
||||
|
||||
## Code Patterns
|
||||
|
||||
**Standard approaches that work:**
|
||||
|
||||
### Pattern: {{Pattern Name}}
|
||||
|
||||
✓ **Good:**
|
||||
```
|
||||
// Example of correct pattern
|
||||
db.query(sql, [param1, param2])
|
||||
```
|
||||
|
||||
✗ **Bad:**
|
||||
```
|
||||
// Example of incorrect pattern
|
||||
sql + userInput
|
||||
```
|
||||
|
||||
### Pattern: {{Another Pattern}}
|
||||
|
||||
✓ **Good:**
|
||||
```
|
||||
// Another example
|
||||
if (!data) return null;
|
||||
```
|
||||
|
||||
✗ **Bad:**
|
||||
```
|
||||
// Don't do this
|
||||
data.map(...) // crashes if data is null
|
||||
```
|
||||
|
||||
## Test Requirements
|
||||
|
||||
**Essential tests for this module:**
|
||||
|
||||
- **Happy path:** Verify primary functionality
|
||||
- **Edge cases:** Test null, undefined, empty arrays, invalid inputs
|
||||
- **Error conditions:** Verify errors are handled properly
|
||||
- **Security:** Test for injection attacks, auth bypasses, etc.
|
||||
|
||||
### Example Test Pattern
|
||||
|
||||
```typescript
|
||||
describe('FeatureName', () => {
|
||||
it('handles happy path', () => {
|
||||
expect(fn(validInput)).toEqual(expected)
|
||||
})
|
||||
|
||||
it('handles edge cases', () => {
|
||||
expect(fn(null)).toThrow()
|
||||
expect(fn([])).toEqual([])
|
||||
})
|
||||
|
||||
it('validates security', () => {
|
||||
expect(fn("' OR 1=1--")).toThrow()
|
||||
})
|
||||
})
|
||||
```
|
||||
|
||||
## Related Stories
|
||||
|
||||
Stories that used these patterns:
|
||||
|
||||
- {{story_key}} - {{brief description}}
|
||||
|
||||
## Notes
|
||||
|
||||
- Keep this simple and actionable
|
||||
- Add new learnings as they emerge
|
||||
- Focus on preventable mistakes
|
||||
Loading…
Reference in New Issue