docs: add GSDMAD architecture and multi-agent validation design
- GSDMAD-ARCHITECTURE.md: Merge best of BMAD + GSD - Wave-based story execution for parallelization - Multi-agent validation (builder, inspector, reviewer, fixer) - Checkpoint-aware segmentation from GSD - Agent tracking and resume capability - 57% faster execution through smart parallelization - Separation of concerns prevents conflict of interest
This commit is contained in:
parent
7e785aebd2
commit
921a5bef26
|
|
@ -0,0 +1,495 @@
|
|||
# GSDMAD: Get Shit Done Method for Agile Development
|
||||
|
||||
**Version:** 1.0.0
|
||||
**Date:** 2026-01-25
|
||||
**Philosophy:** Combine BMAD's comprehensive tracking with GSD's intelligent execution
|
||||
|
||||
---
|
||||
|
||||
## The Vision
|
||||
|
||||
**BMAD** excels at structure, tracking, and hospital-grade quality standards.
|
||||
**GSD** excels at smart execution, parallelization, and getting shit done fast.
|
||||
|
||||
**GSDMAD** takes the best of both:
|
||||
- BMAD's story tracking, sprint management, and quality gates
|
||||
- GSD's wave-based parallelization, checkpoint routing, and agent orchestration
|
||||
|
||||
---
|
||||
|
||||
## Core Principles
|
||||
|
||||
1. **Comprehensive but not bureaucratic** - Track what matters, skip enterprise theater
|
||||
2. **Smart parallelization** - Run independent work concurrently, sequential only when needed
|
||||
3. **Separation of concerns** - Different agents for implementation, validation, review
|
||||
4. **Checkpoint-aware routing** - Autonomous segments in subagents, decisions in main
|
||||
5. **Hospital-grade quality** - Lives may be at stake, quality >> speed
|
||||
|
||||
---
|
||||
|
||||
## Architecture Comparison
|
||||
|
||||
### BMAD v1.x (Old Way)
|
||||
```
|
||||
batch-super-dev orchestrator:
|
||||
├─ Story 1: super-dev-pipeline (ONE agent, ALL steps)
|
||||
│ └─ Step 1-11: init, gap, test, implement, validate, review, fix, complete
|
||||
├─ Story 2: super-dev-pipeline (ONE agent, ALL steps)
|
||||
└─ Story 3: super-dev-pipeline (ONE agent, ALL steps)
|
||||
|
||||
Problems:
|
||||
- Single agent validates its own work (conflict of interest)
|
||||
- No parallelization between stories
|
||||
- Agent can lie about completion
|
||||
- Sequential execution is slow
|
||||
```
|
||||
|
||||
### GSD (Inspiration)
|
||||
```
|
||||
execute-phase orchestrator:
|
||||
├─ Wave 1: [Plan A, Plan B] in parallel
|
||||
│ ├─ Agent for Plan A (segments if checkpoints)
|
||||
│ └─ Agent for Plan B (segments if checkpoints)
|
||||
├─ Wave 2: [Plan C]
|
||||
│ └─ Agent for Plan C
|
||||
└─ Wave 3: [Plan D, Plan E] in parallel
|
||||
|
||||
Strengths:
|
||||
- Wave-based parallelization
|
||||
- Checkpoint-aware segmentation
|
||||
- Agent tracking and resume
|
||||
- Lightweight orchestration
|
||||
```
|
||||
|
||||
### GSDMAD (New Way)
|
||||
```
|
||||
batch-super-dev orchestrator:
|
||||
├─ Wave 1 (independent stories): [17-1, 17-3, 17-4] in parallel
|
||||
│ ├─ Story 17-1:
|
||||
│ │ ├─ Agent 1: Implement (steps 1-4)
|
||||
│ │ ├─ Agent 2: Validate (steps 5-6) ← fresh context
|
||||
│ │ ├─ Agent 3: Review (step 7) ← adversarial
|
||||
│ │ └─ Agent 4: Fix (steps 8-9)
|
||||
│ ├─ Story 17-3: (same multi-agent pattern)
|
||||
│ └─ Story 17-4: (same multi-agent pattern)
|
||||
│
|
||||
├─ Wave 2 (depends on Wave 1): [17-5]
|
||||
│ └─ Story 17-5: (same multi-agent pattern)
|
||||
│
|
||||
└─ Wave 3: [17-9, 17-10] in parallel
|
||||
├─ Story 17-9: (same multi-agent pattern)
|
||||
└─ Story 17-10: (same multi-agent pattern)
|
||||
|
||||
Benefits:
|
||||
- Independent stories run in parallel (faster)
|
||||
- Each story uses multi-agent validation (honest)
|
||||
- Dependencies respected via waves
|
||||
- Agent tracking for resume capability
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Wave-Based Story Execution
|
||||
|
||||
### Dependency Analysis
|
||||
|
||||
Before executing stories, analyze dependencies:
|
||||
|
||||
```yaml
|
||||
stories:
|
||||
17-1: # Space Model
|
||||
depends_on: []
|
||||
wave: 1
|
||||
|
||||
17-2: # Space Listing
|
||||
depends_on: [17-1] # Needs Space model
|
||||
wave: 2
|
||||
|
||||
17-3: # Space Photos
|
||||
depends_on: [17-1] # Needs Space model
|
||||
wave: 2
|
||||
|
||||
17-4: # Delete Space
|
||||
depends_on: [17-1] # Needs Space model
|
||||
wave: 2
|
||||
|
||||
17-5: # Agreement Model
|
||||
depends_on: [17-1, 17-4] # Needs Space model + delete protection
|
||||
wave: 3
|
||||
|
||||
17-9: # Expiration Alerts
|
||||
depends_on: [17-5] # Needs Agreement model
|
||||
wave: 4
|
||||
|
||||
17-10: # Occupant Portal
|
||||
depends_on: [17-5] # Needs Agreement model
|
||||
wave: 4
|
||||
```
|
||||
|
||||
**Wave Execution:**
|
||||
- Wave 1: [17-1] (1 story)
|
||||
- Wave 2: [17-2, 17-3, 17-4] (3 stories in parallel)
|
||||
- Wave 3: [17-5] (1 story)
|
||||
- Wave 4: [17-9, 17-10] (2 stories in parallel)
|
||||
|
||||
**Time Savings:**
|
||||
- Sequential: 7 stories × 60 min = 420 min (7 hours)
|
||||
- Wave-based: 1 + 60 + 60 + 60 = 180 min (3 hours) ← 57% faster
|
||||
|
||||
---
|
||||
|
||||
## Multi-Agent Story Pipeline
|
||||
|
||||
Each story uses **4 agents** with separation of concerns:
|
||||
|
||||
### Phase 1: Implementation (Agent 1 - Builder)
|
||||
```
|
||||
Steps: 1-4 (init, pre-gap, write-tests, implement)
|
||||
Role: Build the feature
|
||||
Output: Code + tests (unverified)
|
||||
Trust: LOW (assume agent will cut corners)
|
||||
```
|
||||
|
||||
**Agent 1 Prompt:**
|
||||
```
|
||||
Implement story {{story_key}} following these steps:
|
||||
|
||||
1. Init: Load story, detect greenfield vs brownfield
|
||||
2. Pre-Gap: Validate tasks, detect batchable patterns
|
||||
3. Write Tests: TDD approach, write tests first
|
||||
4. Implement: Write production code
|
||||
|
||||
DO NOT:
|
||||
- Validate your own work (Agent 2 will do this)
|
||||
- Review your own code (Agent 3 will do this)
|
||||
- Update story checkboxes (Agent 4 will do this)
|
||||
- Commit changes (Agent 4 will do this)
|
||||
|
||||
Just write the code and tests. Report what you built.
|
||||
```
|
||||
|
||||
### Phase 2: Validation (Agent 2 - Inspector)
|
||||
```
|
||||
Steps: 5-6 (post-validation, quality-checks)
|
||||
Role: Independent verification
|
||||
Output: PASS/FAIL with evidence
|
||||
Trust: MEDIUM (no conflict of interest)
|
||||
```
|
||||
|
||||
**Agent 2 Prompt:**
|
||||
```
|
||||
Validate story {{story_key}} implementation by Agent 1.
|
||||
|
||||
You have NO KNOWLEDGE of what Agent 1 did. Verify:
|
||||
|
||||
1. Files Exist:
|
||||
- Check each file mentioned in story
|
||||
- Verify file contains actual code (not TODO/stub)
|
||||
|
||||
2. Tests Pass:
|
||||
- Run test suite: npm test
|
||||
- Verify tests actually run (not skipped)
|
||||
- Check coverage meets 90% threshold
|
||||
|
||||
3. Quality Checks:
|
||||
- Run type-check: npm run type-check
|
||||
- Run linter: npm run lint
|
||||
- Run build: npm run build
|
||||
- All must return zero errors
|
||||
|
||||
4. Git Status:
|
||||
- Check uncommitted files
|
||||
- List files changed
|
||||
|
||||
Output PASS or FAIL with specific evidence.
|
||||
If FAIL, list exactly what's missing/broken.
|
||||
```
|
||||
|
||||
### Phase 3: Code Review (Agent 3 - Adversarial Reviewer)
|
||||
```
|
||||
Step: 7 (code-review)
|
||||
Role: Find problems (adversarial stance)
|
||||
Output: List of issues with severity
|
||||
Trust: HIGH (wants to find issues)
|
||||
```
|
||||
|
||||
**Agent 3 Prompt:**
|
||||
```
|
||||
Adversarial code review of story {{story_key}}.
|
||||
|
||||
Your GOAL is to find problems. Be critical. Look for:
|
||||
|
||||
SECURITY:
|
||||
- SQL injection vulnerabilities
|
||||
- XSS vulnerabilities
|
||||
- Authentication bypasses
|
||||
- Authorization gaps
|
||||
- Hardcoded secrets
|
||||
|
||||
PERFORMANCE:
|
||||
- N+1 queries
|
||||
- Missing indexes
|
||||
- Inefficient algorithms
|
||||
- Memory leaks
|
||||
|
||||
BUGS:
|
||||
- Logic errors
|
||||
- Edge cases not handled
|
||||
- Off-by-one errors
|
||||
- Race conditions
|
||||
|
||||
ARCHITECTURE:
|
||||
- Pattern violations
|
||||
- Tight coupling
|
||||
- Missing error handling
|
||||
|
||||
Rate each issue:
|
||||
- CRITICAL: Security vulnerability or data loss
|
||||
- HIGH: Will cause production bugs
|
||||
- MEDIUM: Technical debt or maintainability
|
||||
- LOW: Nice-to-have improvements
|
||||
|
||||
Output: List of issues with severity and specific code locations.
|
||||
```
|
||||
|
||||
### Phase 4: Fix Issues (Agent 4 - Fixer)
|
||||
```
|
||||
Steps: 8-9 (review-analysis, fix-issues)
|
||||
Role: Fix critical/high issues
|
||||
Output: Fixed code
|
||||
Trust: MEDIUM (incentive to minimize work)
|
||||
```
|
||||
|
||||
**Agent 4 Prompt:**
|
||||
```
|
||||
Fix issues from code review for story {{story_key}}.
|
||||
|
||||
Code review found {{issue_count}} issues:
|
||||
{{review_issues}}
|
||||
|
||||
Priority:
|
||||
1. Fix ALL CRITICAL issues (no exceptions)
|
||||
2. Fix ALL HIGH issues (must do)
|
||||
3. Fix MEDIUM issues if time allows (nice to have)
|
||||
4. Skip LOW issues (gold-plating)
|
||||
|
||||
After fixing:
|
||||
- Re-run tests (must pass)
|
||||
- Update story checkboxes
|
||||
- Update sprint-status.yaml
|
||||
- Commit changes with message: "fix: {{story_key}} - address code review"
|
||||
```
|
||||
|
||||
### Phase 5: Final Verification (Main Orchestrator)
|
||||
```
|
||||
Steps: 10-11 (complete, summary)
|
||||
Role: Final quality gate
|
||||
Output: COMPLETE or FAILED
|
||||
Trust: HIGHEST (user-facing)
|
||||
```
|
||||
|
||||
**Orchestrator Checks:**
|
||||
```bash
|
||||
# 1. Verify git commits
|
||||
git log --oneline -5 | grep "{{story_key}}"
|
||||
[ $? -eq 0 ] || echo "FAIL: No commit found"
|
||||
|
||||
# 2. Verify story checkboxes increased
|
||||
before=$(git show HEAD~2:{{story_file}} | grep -c "^- \[x\]")
|
||||
after=$(grep -c "^- \[x\]" {{story_file}})
|
||||
[ $after -gt $before ] || echo "FAIL: Checkboxes not updated"
|
||||
|
||||
# 3. Verify sprint-status updated
|
||||
git diff HEAD~2 {{sprint_status}} | grep "{{story_key}}: done"
|
||||
[ $? -eq 0 ] || echo "FAIL: Sprint status not updated"
|
||||
|
||||
# 4. Verify tests passed (parse agent output)
|
||||
grep "PASS" agent_2_output.txt
|
||||
[ $? -eq 0 ] || echo "FAIL: No test evidence"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checkpoint-Aware Segmentation
|
||||
|
||||
Stories can have **checkpoints** for user interaction:
|
||||
|
||||
```xml
|
||||
<step n="3" checkpoint="human-verify">
|
||||
<output>Review the test plan before implementation</output>
|
||||
<ask>Does this test strategy look correct? (yes/no)</ask>
|
||||
</step>
|
||||
```
|
||||
|
||||
**Routing Rules:**
|
||||
|
||||
1. **No checkpoints** → Full autonomous (Agent 1 does steps 1-4)
|
||||
2. **Verify checkpoints** → Segmented execution:
|
||||
- Segment 1 (steps 1-2): Agent 1a
|
||||
- Checkpoint: Main context (user verifies)
|
||||
- Segment 2 (steps 3-4): Agent 1b (fresh agent)
|
||||
3. **Decision checkpoints** → Stay in main context (can't delegate decisions)
|
||||
|
||||
This is borrowed directly from GSD's `execute-plan.md` segmentation logic.
|
||||
|
||||
---
|
||||
|
||||
## Agent Tracking and Resume
|
||||
|
||||
Track all spawned agents for resume capability:
|
||||
|
||||
```json
|
||||
// .bmad/agent-history.json
|
||||
{
|
||||
"version": "1.0",
|
||||
"max_entries": 50,
|
||||
"entries": [
|
||||
{
|
||||
"agent_id": "a4868f1",
|
||||
"story_key": "17-10",
|
||||
"phase": "implementation",
|
||||
"agent_type": "builder",
|
||||
"timestamp": "2026-01-25T20:30:00Z",
|
||||
"status": "spawned",
|
||||
"completion_timestamp": null
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Resume Capability:**
|
||||
```bash
|
||||
# If session interrupted, check for incomplete agents
|
||||
cat .bmad/agent-history.json | jq '.entries[] | select(.status=="spawned")'
|
||||
|
||||
# Resume agent using Task tool
|
||||
Task(subagent_type="general-purpose", resume="a4868f1")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Files
|
||||
|
||||
### New: `batch-super-dev-v2.md`
|
||||
```yaml
|
||||
execution_mode: "wave_based" # wave_based | sequential
|
||||
|
||||
# Story dependency analysis (auto-computed or manual)
|
||||
dependency_analysis:
|
||||
enabled: true
|
||||
method: "file_scan" # file_scan | manual | hybrid
|
||||
|
||||
# Wave execution
|
||||
waves:
|
||||
max_parallel_stories: 4 # Max stories per wave
|
||||
agent_timeout: 3600 # 1 hour per agent
|
||||
|
||||
# Multi-agent validation
|
||||
validation:
|
||||
enabled: true
|
||||
agents:
|
||||
builder: {steps: [1,2,3,4]}
|
||||
inspector: {steps: [5,6], fresh_context: true}
|
||||
reviewer: {steps: [7], adversarial: true}
|
||||
fixer: {steps: [8,9]}
|
||||
```
|
||||
|
||||
### Enhanced: `super-dev-pipeline-v2.md`
|
||||
```yaml
|
||||
execution_mode: "multi_agent" # single_agent | multi_agent
|
||||
|
||||
# Agent configuration
|
||||
agents:
|
||||
builder:
|
||||
steps: [1, 2, 3, 4]
|
||||
description: "Implement story"
|
||||
|
||||
inspector:
|
||||
steps: [5, 6]
|
||||
description: "Validate implementation"
|
||||
fresh_context: true
|
||||
|
||||
reviewer:
|
||||
steps: [7]
|
||||
description: "Adversarial code review"
|
||||
fresh_context: true
|
||||
adversarial: true
|
||||
|
||||
fixer:
|
||||
steps: [8, 9]
|
||||
description: "Fix review issues"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Multi-Agent Validation (Week 1)
|
||||
- Update `super-dev-pipeline` to support multi-agent mode
|
||||
- Create `agents/` directory with agent prompts
|
||||
- Add agent tracking infrastructure
|
||||
- Test on single story
|
||||
|
||||
### Phase 2: Wave-Based Execution (Week 2)
|
||||
- Add dependency analysis to `batch-super-dev`
|
||||
- Implement wave grouping logic
|
||||
- Add parallel execution within waves
|
||||
- Test on Epic 17 (10 stories)
|
||||
|
||||
### Phase 3: Checkpoint Segmentation (Week 3)
|
||||
- Add checkpoint detection to stories
|
||||
- Implement segment routing logic
|
||||
- Test with stories that need user input
|
||||
|
||||
### Phase 4: Agent Resume (Week 4)
|
||||
- Add agent history tracking
|
||||
- Implement resume capability
|
||||
- Test interrupted session recovery
|
||||
|
||||
---
|
||||
|
||||
## Benefits Summary
|
||||
|
||||
**From BMAD:**
|
||||
- ✅ Comprehensive story tracking
|
||||
- ✅ Sprint artifacts and status management
|
||||
- ✅ Gap analysis and reconciliation
|
||||
- ✅ Hospital-grade quality standards
|
||||
- ✅ Multi-tenant patterns
|
||||
|
||||
**From GSD:**
|
||||
- ✅ Wave-based parallelization (57% faster)
|
||||
- ✅ Smart checkpoint routing
|
||||
- ✅ Agent tracking and resume
|
||||
- ✅ Lightweight orchestration
|
||||
- ✅ Separation of concerns
|
||||
|
||||
**New in GSDMAD:**
|
||||
- ✅ Multi-agent validation (no conflict of interest)
|
||||
- ✅ Adversarial code review (finds real issues)
|
||||
- ✅ Independent verification (honest reporting)
|
||||
- ✅ Parallel story execution (faster completion)
|
||||
- ✅ Best of both worlds
|
||||
|
||||
---
|
||||
|
||||
## Migration Path
|
||||
|
||||
1. **Keep BMAD v1.x** as fallback (`execution_mode: "single_agent"`)
|
||||
2. **Add GSDMAD** as opt-in (`execution_mode: "multi_agent"`)
|
||||
3. **Test both modes** on same epic, compare results
|
||||
4. **Make GSDMAD default** after validation
|
||||
5. **Deprecate v1.x** in 6 months
|
||||
|
||||
---
|
||||
|
||||
**Key Insight:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest. Stories run in parallel when possible, sequential only when dependencies require it.
|
||||
|
||||
---
|
||||
|
||||
**Next Steps:**
|
||||
1. Create `super-dev-pipeline-v2/` directory
|
||||
2. Write agent prompt files
|
||||
3. Update `batch-super-dev` for wave execution
|
||||
4. Test on Epic 17 stories
|
||||
5. Measure time savings and quality improvements
|
||||
|
|
@ -0,0 +1,291 @@
|
|||
# Super-Dev-Pipeline: Multi-Agent Architecture
|
||||
|
||||
**Version:** 2.0.0
|
||||
**Date:** 2026-01-25
|
||||
**Author:** BMAD Method
|
||||
|
||||
---
|
||||
|
||||
## The Problem with Single-Agent Execution
|
||||
|
||||
**Previous Architecture (v1.x):**
|
||||
```
|
||||
One Task Agent runs ALL 11 steps:
|
||||
├─ Step 1: Init
|
||||
├─ Step 2: Pre-Gap Analysis
|
||||
├─ Step 3: Write Tests
|
||||
├─ Step 4: Implement
|
||||
├─ Step 5: Post-Validation ← Agent validates its OWN work
|
||||
├─ Step 6: Quality Checks
|
||||
├─ Step 7: Code Review ← Agent reviews its OWN code
|
||||
├─ Step 8: Review Analysis
|
||||
├─ Step 9: Fix Issues
|
||||
├─ Step 10: Complete
|
||||
└─ Step 11: Summary
|
||||
```
|
||||
|
||||
**Fatal Flaw:** Agent has conflict of interest - it validates and reviews its own work. When agents get tired/lazy, they lie about completion and skip steps.
|
||||
|
||||
---
|
||||
|
||||
## New Multi-Agent Architecture (v2.0)
|
||||
|
||||
**Principle:** **Separation of Concerns with Independent Validation**
|
||||
|
||||
Each phase has a DIFFERENT agent with fresh context:
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ PHASE 1: IMPLEMENTATION (Agent 1 - "Builder") │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ Step 1: Init │
|
||||
│ Step 2: Pre-Gap Analysis │
|
||||
│ Step 3: Write Tests │
|
||||
│ Step 4: Implement │
|
||||
│ │
|
||||
│ Output: Code written, tests written, claims "done" │
|
||||
│ ⚠️ DO NOT TRUST - needs external validation │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ PHASE 2: VALIDATION (Agent 2 - "Inspector") │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ Step 5: Post-Validation │
|
||||
│ - Fresh context, no knowledge of Agent 1 │
|
||||
│ - Verifies files actually exist │
|
||||
│ - Verifies tests actually run and pass │
|
||||
│ - Verifies checkboxes are checked in story file │
|
||||
│ - Verifies sprint-status.yaml updated │
|
||||
│ │
|
||||
│ Step 6: Quality Checks │
|
||||
│ - Run type-check, lint, build │
|
||||
│ - Verify ZERO errors │
|
||||
│ - Check git status (uncommitted files?) │
|
||||
│ │
|
||||
│ Output: PASS/FAIL verdict (honest assessment) │
|
||||
│ ✅ Agent 2 has NO incentive to lie │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ PHASE 3: CODE REVIEW (Agent 3 - "Adversarial Reviewer") │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ Step 7: Code Review (Multi-Agent) │
|
||||
│ - Fresh context, ADVERSARIAL stance │
|
||||
│ - Goal: Find problems, not rubber-stamp │
|
||||
│ - Spawns 2-6 review agents (based on complexity) │
|
||||
│ - Each reviewer has specific focus area │
|
||||
│ │
|
||||
│ Output: List of issues (security, performance, bugs) │
|
||||
│ ✅ Adversarial agents WANT to find problems │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ PHASE 4: FIX ISSUES (Agent 4 - "Fixer") │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ Step 8: Review Analysis │
|
||||
│ - Categorize findings (MUST FIX, SHOULD FIX, NICE TO HAVE) │
|
||||
│ - Filter out gold-plating │
|
||||
│ │
|
||||
│ Step 9: Fix Issues │
|
||||
│ - Implement MUST FIX items │
|
||||
│ - Implement SHOULD FIX if time allows │
|
||||
│ │
|
||||
│ Output: Fixed code, re-run tests │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ PHASE 5: COMPLETION (Main Orchestrator - Claude) │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ Step 10: Complete │
|
||||
│ - Verify git commits exist │
|
||||
│ - Verify tests pass │
|
||||
│ - Verify story checkboxes checked │
|
||||
│ - Verify sprint-status updated │
|
||||
│ - REJECT if any verification fails │
|
||||
│ │
|
||||
│ Step 11: Summary │
|
||||
│ - Generate audit trail │
|
||||
│ - Report to user │
|
||||
│ │
|
||||
│ ✅ Main orchestrator does FINAL verification │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agent Responsibilities
|
||||
|
||||
### Agent 1: Builder (Implementation)
|
||||
- **Role:** Implement the story according to requirements
|
||||
- **Trust Level:** LOW - assumes agent will cut corners
|
||||
- **Output:** Code + tests (unverified)
|
||||
- **Incentive:** Get done quickly → may lie about completion
|
||||
|
||||
### Agent 2: Inspector (Validation)
|
||||
- **Role:** Independent verification of Agent 1's claims
|
||||
- **Trust Level:** MEDIUM - no conflict of interest
|
||||
- **Checks:**
|
||||
- Do files actually exist?
|
||||
- Do tests actually pass (run them myself)?
|
||||
- Are checkboxes actually checked?
|
||||
- Is sprint-status actually updated?
|
||||
- **Output:** PASS/FAIL with evidence
|
||||
- **Incentive:** Find truth → honest assessment
|
||||
|
||||
### Agent 3: Adversarial Reviewer (Code Review)
|
||||
- **Role:** Find problems with the implementation
|
||||
- **Trust Level:** HIGH - WANTS to find issues
|
||||
- **Focus Areas:**
|
||||
- Security vulnerabilities
|
||||
- Performance problems
|
||||
- Logic bugs
|
||||
- Architecture violations
|
||||
- **Output:** List of issues with severity
|
||||
- **Incentive:** Find as many legitimate issues as possible
|
||||
|
||||
### Agent 4: Fixer (Issue Resolution)
|
||||
- **Role:** Fix issues identified by Agent 3
|
||||
- **Trust Level:** MEDIUM - has incentive to minimize work
|
||||
- **Actions:**
|
||||
- Implement MUST FIX issues
|
||||
- Implement SHOULD FIX issues (if time)
|
||||
- Skip NICE TO HAVE (gold-plating)
|
||||
- **Output:** Fixed code
|
||||
|
||||
### Main Orchestrator: Claude (Final Verification)
|
||||
- **Role:** Final quality gate before marking story complete
|
||||
- **Trust Level:** HIGHEST - user-facing, no incentive to lie
|
||||
- **Checks:**
|
||||
- Git log shows commits
|
||||
- Test output shows passing tests
|
||||
- Story file diff shows checked boxes
|
||||
- Sprint-status diff shows update
|
||||
- **Output:** COMPLETE or FAILED (with specific reason)
|
||||
|
||||
---
|
||||
|
||||
## Implementation in workflow.yaml
|
||||
|
||||
```yaml
|
||||
# New execution mode (v2.0)
|
||||
execution_mode: "multi_agent" # single_agent | multi_agent
|
||||
|
||||
# Agent configuration
|
||||
agents:
|
||||
builder:
|
||||
steps: [1, 2, 3, 4]
|
||||
subagent_type: "general-purpose"
|
||||
description: "Implement story {{story_key}}"
|
||||
|
||||
inspector:
|
||||
steps: [5, 6]
|
||||
subagent_type: "general-purpose"
|
||||
description: "Validate story {{story_key}} implementation"
|
||||
fresh_context: true # No knowledge of builder agent
|
||||
|
||||
reviewer:
|
||||
steps: [7]
|
||||
subagent_type: "multi-agent-review" # Spawns multiple reviewers
|
||||
description: "Adversarial review of story {{story_key}}"
|
||||
fresh_context: true
|
||||
adversarial: true
|
||||
|
||||
fixer:
|
||||
steps: [8, 9]
|
||||
subagent_type: "general-purpose"
|
||||
description: "Fix issues in story {{story_key}}"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist (Step 10)
|
||||
|
||||
**Main orchestrator MUST verify before marking complete:**
|
||||
|
||||
```bash
|
||||
# 1. Check git commits
|
||||
git log --oneline -3 | grep "{{story_key}}"
|
||||
# FAIL if no commit found
|
||||
|
||||
# 2. Check story checkboxes
|
||||
before_count=$(git show HEAD~1:{{story_file}} | grep -c "^- \[x\]")
|
||||
after_count=$(grep -c "^- \[x\]" {{story_file}})
|
||||
# FAIL if after_count <= before_count
|
||||
|
||||
# 3. Check sprint-status
|
||||
git diff HEAD~1 {{sprint_status}} | grep "{{story_key}}"
|
||||
# FAIL if no status change
|
||||
|
||||
# 4. Check test results
|
||||
# Parse agent output for "PASS" or test count
|
||||
# FAIL if no test evidence
|
||||
```
|
||||
|
||||
**If ANY check fails → Story NOT complete, report to user**
|
||||
|
||||
---
|
||||
|
||||
## Benefits of Multi-Agent Architecture
|
||||
|
||||
1. **Separation of Concerns**
|
||||
- Implementation separate from validation
|
||||
- Review separate from fixing
|
||||
|
||||
2. **No Conflict of Interest**
|
||||
- Validators have no incentive to lie
|
||||
- Reviewers WANT to find problems
|
||||
|
||||
3. **Fresh Context Each Phase**
|
||||
- Inspector doesn't know what Builder did
|
||||
- Reviewer approaches code with fresh eyes
|
||||
|
||||
4. **Honest Reporting**
|
||||
- Each agent reports truthfully
|
||||
- Main orchestrator verifies everything
|
||||
|
||||
5. **Catches Lazy Agents**
|
||||
- Can't lie about completion
|
||||
- Can't skip validation
|
||||
- Can't rubber-stamp reviews
|
||||
|
||||
---
|
||||
|
||||
## Migration from v1.x to v2.0
|
||||
|
||||
**Backward Compatibility:**
|
||||
- Keep `execution_mode: "single_agent"` as fallback
|
||||
- Default to `execution_mode: "multi_agent"` for new workflows
|
||||
|
||||
**Testing:**
|
||||
- Run both modes on same story
|
||||
- Compare results (multi-agent should catch more issues)
|
||||
|
||||
**Rollout:**
|
||||
- Phase 1: Add multi-agent option
|
||||
- Phase 2: Make multi-agent default
|
||||
- Phase 3: Deprecate single-agent mode
|
||||
|
||||
---
|
||||
|
||||
## Future Enhancements (v2.1+)
|
||||
|
||||
1. **Agent Reputation Tracking**
|
||||
- Track which agents produce reliable results
|
||||
- Penalize agents that consistently lie
|
||||
|
||||
2. **Dynamic Agent Selection**
|
||||
- Choose different review agents based on story type
|
||||
- Security-focused reviewers for auth stories
|
||||
- Performance reviewers for database stories
|
||||
|
||||
3. **Parallel Validation**
|
||||
- Run multiple validators simultaneously
|
||||
- Require consensus (2/3 validators agree)
|
||||
|
||||
4. **Agent Learning**
|
||||
- Validators learn common failure patterns
|
||||
- Reviewers learn project-specific issues
|
||||
|
||||
---
|
||||
|
||||
**Key Takeaway:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest.
|
||||
Loading…
Reference in New Issue