docs: add GSDMAD architecture and multi-agent validation design
- GSDMAD-ARCHITECTURE.md: Merge best of BMAD + GSD - Wave-based story execution for parallelization - Multi-agent validation (builder, inspector, reviewer, fixer) - Checkpoint-aware segmentation from GSD - Agent tracking and resume capability - 57% faster execution through smart parallelization - Separation of concerns prevents conflict of interest
This commit is contained in:
parent
7e785aebd2
commit
921a5bef26
|
|
@ -0,0 +1,495 @@
|
||||||
|
# GSDMAD: Get Shit Done Method for Agile Development
|
||||||
|
|
||||||
|
**Version:** 1.0.0
|
||||||
|
**Date:** 2026-01-25
|
||||||
|
**Philosophy:** Combine BMAD's comprehensive tracking with GSD's intelligent execution
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Vision
|
||||||
|
|
||||||
|
**BMAD** excels at structure, tracking, and hospital-grade quality standards.
|
||||||
|
**GSD** excels at smart execution, parallelization, and getting shit done fast.
|
||||||
|
|
||||||
|
**GSDMAD** takes the best of both:
|
||||||
|
- BMAD's story tracking, sprint management, and quality gates
|
||||||
|
- GSD's wave-based parallelization, checkpoint routing, and agent orchestration
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Principles
|
||||||
|
|
||||||
|
1. **Comprehensive but not bureaucratic** - Track what matters, skip enterprise theater
|
||||||
|
2. **Smart parallelization** - Run independent work concurrently, sequential only when needed
|
||||||
|
3. **Separation of concerns** - Different agents for implementation, validation, review
|
||||||
|
4. **Checkpoint-aware routing** - Autonomous segments in subagents, decisions in main
|
||||||
|
5. **Hospital-grade quality** - Lives may be at stake, quality >> speed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Comparison
|
||||||
|
|
||||||
|
### BMAD v1.x (Old Way)
|
||||||
|
```
|
||||||
|
batch-super-dev orchestrator:
|
||||||
|
├─ Story 1: super-dev-pipeline (ONE agent, ALL steps)
|
||||||
|
│ └─ Step 1-11: init, gap, test, implement, validate, review, fix, complete
|
||||||
|
├─ Story 2: super-dev-pipeline (ONE agent, ALL steps)
|
||||||
|
└─ Story 3: super-dev-pipeline (ONE agent, ALL steps)
|
||||||
|
|
||||||
|
Problems:
|
||||||
|
- Single agent validates its own work (conflict of interest)
|
||||||
|
- No parallelization between stories
|
||||||
|
- Agent can lie about completion
|
||||||
|
- Sequential execution is slow
|
||||||
|
```
|
||||||
|
|
||||||
|
### GSD (Inspiration)
|
||||||
|
```
|
||||||
|
execute-phase orchestrator:
|
||||||
|
├─ Wave 1: [Plan A, Plan B] in parallel
|
||||||
|
│ ├─ Agent for Plan A (segments if checkpoints)
|
||||||
|
│ └─ Agent for Plan B (segments if checkpoints)
|
||||||
|
├─ Wave 2: [Plan C]
|
||||||
|
│ └─ Agent for Plan C
|
||||||
|
└─ Wave 3: [Plan D, Plan E] in parallel
|
||||||
|
|
||||||
|
Strengths:
|
||||||
|
- Wave-based parallelization
|
||||||
|
- Checkpoint-aware segmentation
|
||||||
|
- Agent tracking and resume
|
||||||
|
- Lightweight orchestration
|
||||||
|
```
|
||||||
|
|
||||||
|
### GSDMAD (New Way)
|
||||||
|
```
|
||||||
|
batch-super-dev orchestrator:
|
||||||
|
├─ Wave 1 (independent stories): [17-1, 17-3, 17-4] in parallel
|
||||||
|
│ ├─ Story 17-1:
|
||||||
|
│ │ ├─ Agent 1: Implement (steps 1-4)
|
||||||
|
│ │ ├─ Agent 2: Validate (steps 5-6) ← fresh context
|
||||||
|
│ │ ├─ Agent 3: Review (step 7) ← adversarial
|
||||||
|
│ │ └─ Agent 4: Fix (steps 8-9)
|
||||||
|
│ ├─ Story 17-3: (same multi-agent pattern)
|
||||||
|
│ └─ Story 17-4: (same multi-agent pattern)
|
||||||
|
│
|
||||||
|
├─ Wave 2 (depends on Wave 1): [17-5]
|
||||||
|
│ └─ Story 17-5: (same multi-agent pattern)
|
||||||
|
│
|
||||||
|
└─ Wave 3: [17-9, 17-10] in parallel
|
||||||
|
├─ Story 17-9: (same multi-agent pattern)
|
||||||
|
└─ Story 17-10: (same multi-agent pattern)
|
||||||
|
|
||||||
|
Benefits:
|
||||||
|
- Independent stories run in parallel (faster)
|
||||||
|
- Each story uses multi-agent validation (honest)
|
||||||
|
- Dependencies respected via waves
|
||||||
|
- Agent tracking for resume capability
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Wave-Based Story Execution
|
||||||
|
|
||||||
|
### Dependency Analysis
|
||||||
|
|
||||||
|
Before executing stories, analyze dependencies:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
stories:
|
||||||
|
17-1: # Space Model
|
||||||
|
depends_on: []
|
||||||
|
wave: 1
|
||||||
|
|
||||||
|
17-2: # Space Listing
|
||||||
|
depends_on: [17-1] # Needs Space model
|
||||||
|
wave: 2
|
||||||
|
|
||||||
|
17-3: # Space Photos
|
||||||
|
depends_on: [17-1] # Needs Space model
|
||||||
|
wave: 2
|
||||||
|
|
||||||
|
17-4: # Delete Space
|
||||||
|
depends_on: [17-1] # Needs Space model
|
||||||
|
wave: 2
|
||||||
|
|
||||||
|
17-5: # Agreement Model
|
||||||
|
depends_on: [17-1, 17-4] # Needs Space model + delete protection
|
||||||
|
wave: 3
|
||||||
|
|
||||||
|
17-9: # Expiration Alerts
|
||||||
|
depends_on: [17-5] # Needs Agreement model
|
||||||
|
wave: 4
|
||||||
|
|
||||||
|
17-10: # Occupant Portal
|
||||||
|
depends_on: [17-5] # Needs Agreement model
|
||||||
|
wave: 4
|
||||||
|
```
|
||||||
|
|
||||||
|
**Wave Execution:**
|
||||||
|
- Wave 1: [17-1] (1 story)
|
||||||
|
- Wave 2: [17-2, 17-3, 17-4] (3 stories in parallel)
|
||||||
|
- Wave 3: [17-5] (1 story)
|
||||||
|
- Wave 4: [17-9, 17-10] (2 stories in parallel)
|
||||||
|
|
||||||
|
**Time Savings:**
|
||||||
|
- Sequential: 7 stories × 60 min = 420 min (7 hours)
|
||||||
|
- Wave-based: 1 + 60 + 60 + 60 = 180 min (3 hours) ← 57% faster
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Multi-Agent Story Pipeline
|
||||||
|
|
||||||
|
Each story uses **4 agents** with separation of concerns:
|
||||||
|
|
||||||
|
### Phase 1: Implementation (Agent 1 - Builder)
|
||||||
|
```
|
||||||
|
Steps: 1-4 (init, pre-gap, write-tests, implement)
|
||||||
|
Role: Build the feature
|
||||||
|
Output: Code + tests (unverified)
|
||||||
|
Trust: LOW (assume agent will cut corners)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Agent 1 Prompt:**
|
||||||
|
```
|
||||||
|
Implement story {{story_key}} following these steps:
|
||||||
|
|
||||||
|
1. Init: Load story, detect greenfield vs brownfield
|
||||||
|
2. Pre-Gap: Validate tasks, detect batchable patterns
|
||||||
|
3. Write Tests: TDD approach, write tests first
|
||||||
|
4. Implement: Write production code
|
||||||
|
|
||||||
|
DO NOT:
|
||||||
|
- Validate your own work (Agent 2 will do this)
|
||||||
|
- Review your own code (Agent 3 will do this)
|
||||||
|
- Update story checkboxes (Agent 4 will do this)
|
||||||
|
- Commit changes (Agent 4 will do this)
|
||||||
|
|
||||||
|
Just write the code and tests. Report what you built.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 2: Validation (Agent 2 - Inspector)
|
||||||
|
```
|
||||||
|
Steps: 5-6 (post-validation, quality-checks)
|
||||||
|
Role: Independent verification
|
||||||
|
Output: PASS/FAIL with evidence
|
||||||
|
Trust: MEDIUM (no conflict of interest)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Agent 2 Prompt:**
|
||||||
|
```
|
||||||
|
Validate story {{story_key}} implementation by Agent 1.
|
||||||
|
|
||||||
|
You have NO KNOWLEDGE of what Agent 1 did. Verify:
|
||||||
|
|
||||||
|
1. Files Exist:
|
||||||
|
- Check each file mentioned in story
|
||||||
|
- Verify file contains actual code (not TODO/stub)
|
||||||
|
|
||||||
|
2. Tests Pass:
|
||||||
|
- Run test suite: npm test
|
||||||
|
- Verify tests actually run (not skipped)
|
||||||
|
- Check coverage meets 90% threshold
|
||||||
|
|
||||||
|
3. Quality Checks:
|
||||||
|
- Run type-check: npm run type-check
|
||||||
|
- Run linter: npm run lint
|
||||||
|
- Run build: npm run build
|
||||||
|
- All must return zero errors
|
||||||
|
|
||||||
|
4. Git Status:
|
||||||
|
- Check uncommitted files
|
||||||
|
- List files changed
|
||||||
|
|
||||||
|
Output PASS or FAIL with specific evidence.
|
||||||
|
If FAIL, list exactly what's missing/broken.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 3: Code Review (Agent 3 - Adversarial Reviewer)
|
||||||
|
```
|
||||||
|
Step: 7 (code-review)
|
||||||
|
Role: Find problems (adversarial stance)
|
||||||
|
Output: List of issues with severity
|
||||||
|
Trust: HIGH (wants to find issues)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Agent 3 Prompt:**
|
||||||
|
```
|
||||||
|
Adversarial code review of story {{story_key}}.
|
||||||
|
|
||||||
|
Your GOAL is to find problems. Be critical. Look for:
|
||||||
|
|
||||||
|
SECURITY:
|
||||||
|
- SQL injection vulnerabilities
|
||||||
|
- XSS vulnerabilities
|
||||||
|
- Authentication bypasses
|
||||||
|
- Authorization gaps
|
||||||
|
- Hardcoded secrets
|
||||||
|
|
||||||
|
PERFORMANCE:
|
||||||
|
- N+1 queries
|
||||||
|
- Missing indexes
|
||||||
|
- Inefficient algorithms
|
||||||
|
- Memory leaks
|
||||||
|
|
||||||
|
BUGS:
|
||||||
|
- Logic errors
|
||||||
|
- Edge cases not handled
|
||||||
|
- Off-by-one errors
|
||||||
|
- Race conditions
|
||||||
|
|
||||||
|
ARCHITECTURE:
|
||||||
|
- Pattern violations
|
||||||
|
- Tight coupling
|
||||||
|
- Missing error handling
|
||||||
|
|
||||||
|
Rate each issue:
|
||||||
|
- CRITICAL: Security vulnerability or data loss
|
||||||
|
- HIGH: Will cause production bugs
|
||||||
|
- MEDIUM: Technical debt or maintainability
|
||||||
|
- LOW: Nice-to-have improvements
|
||||||
|
|
||||||
|
Output: List of issues with severity and specific code locations.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 4: Fix Issues (Agent 4 - Fixer)
|
||||||
|
```
|
||||||
|
Steps: 8-9 (review-analysis, fix-issues)
|
||||||
|
Role: Fix critical/high issues
|
||||||
|
Output: Fixed code
|
||||||
|
Trust: MEDIUM (incentive to minimize work)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Agent 4 Prompt:**
|
||||||
|
```
|
||||||
|
Fix issues from code review for story {{story_key}}.
|
||||||
|
|
||||||
|
Code review found {{issue_count}} issues:
|
||||||
|
{{review_issues}}
|
||||||
|
|
||||||
|
Priority:
|
||||||
|
1. Fix ALL CRITICAL issues (no exceptions)
|
||||||
|
2. Fix ALL HIGH issues (must do)
|
||||||
|
3. Fix MEDIUM issues if time allows (nice to have)
|
||||||
|
4. Skip LOW issues (gold-plating)
|
||||||
|
|
||||||
|
After fixing:
|
||||||
|
- Re-run tests (must pass)
|
||||||
|
- Update story checkboxes
|
||||||
|
- Update sprint-status.yaml
|
||||||
|
- Commit changes with message: "fix: {{story_key}} - address code review"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 5: Final Verification (Main Orchestrator)
|
||||||
|
```
|
||||||
|
Steps: 10-11 (complete, summary)
|
||||||
|
Role: Final quality gate
|
||||||
|
Output: COMPLETE or FAILED
|
||||||
|
Trust: HIGHEST (user-facing)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Orchestrator Checks:**
|
||||||
|
```bash
|
||||||
|
# 1. Verify git commits
|
||||||
|
git log --oneline -5 | grep "{{story_key}}"
|
||||||
|
[ $? -eq 0 ] || echo "FAIL: No commit found"
|
||||||
|
|
||||||
|
# 2. Verify story checkboxes increased
|
||||||
|
before=$(git show HEAD~2:{{story_file}} | grep -c "^- \[x\]")
|
||||||
|
after=$(grep -c "^- \[x\]" {{story_file}})
|
||||||
|
[ $after -gt $before ] || echo "FAIL: Checkboxes not updated"
|
||||||
|
|
||||||
|
# 3. Verify sprint-status updated
|
||||||
|
git diff HEAD~2 {{sprint_status}} | grep "{{story_key}}: done"
|
||||||
|
[ $? -eq 0 ] || echo "FAIL: Sprint status not updated"
|
||||||
|
|
||||||
|
# 4. Verify tests passed (parse agent output)
|
||||||
|
grep "PASS" agent_2_output.txt
|
||||||
|
[ $? -eq 0 ] || echo "FAIL: No test evidence"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Checkpoint-Aware Segmentation
|
||||||
|
|
||||||
|
Stories can have **checkpoints** for user interaction:
|
||||||
|
|
||||||
|
```xml
|
||||||
|
<step n="3" checkpoint="human-verify">
|
||||||
|
<output>Review the test plan before implementation</output>
|
||||||
|
<ask>Does this test strategy look correct? (yes/no)</ask>
|
||||||
|
</step>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Routing Rules:**
|
||||||
|
|
||||||
|
1. **No checkpoints** → Full autonomous (Agent 1 does steps 1-4)
|
||||||
|
2. **Verify checkpoints** → Segmented execution:
|
||||||
|
- Segment 1 (steps 1-2): Agent 1a
|
||||||
|
- Checkpoint: Main context (user verifies)
|
||||||
|
- Segment 2 (steps 3-4): Agent 1b (fresh agent)
|
||||||
|
3. **Decision checkpoints** → Stay in main context (can't delegate decisions)
|
||||||
|
|
||||||
|
This is borrowed directly from GSD's `execute-plan.md` segmentation logic.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Agent Tracking and Resume
|
||||||
|
|
||||||
|
Track all spawned agents for resume capability:
|
||||||
|
|
||||||
|
```json
|
||||||
|
// .bmad/agent-history.json
|
||||||
|
{
|
||||||
|
"version": "1.0",
|
||||||
|
"max_entries": 50,
|
||||||
|
"entries": [
|
||||||
|
{
|
||||||
|
"agent_id": "a4868f1",
|
||||||
|
"story_key": "17-10",
|
||||||
|
"phase": "implementation",
|
||||||
|
"agent_type": "builder",
|
||||||
|
"timestamp": "2026-01-25T20:30:00Z",
|
||||||
|
"status": "spawned",
|
||||||
|
"completion_timestamp": null
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Resume Capability:**
|
||||||
|
```bash
|
||||||
|
# If session interrupted, check for incomplete agents
|
||||||
|
cat .bmad/agent-history.json | jq '.entries[] | select(.status=="spawned")'
|
||||||
|
|
||||||
|
# Resume agent using Task tool
|
||||||
|
Task(subagent_type="general-purpose", resume="a4868f1")
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Workflow Files
|
||||||
|
|
||||||
|
### New: `batch-super-dev-v2.md`
|
||||||
|
```yaml
|
||||||
|
execution_mode: "wave_based" # wave_based | sequential
|
||||||
|
|
||||||
|
# Story dependency analysis (auto-computed or manual)
|
||||||
|
dependency_analysis:
|
||||||
|
enabled: true
|
||||||
|
method: "file_scan" # file_scan | manual | hybrid
|
||||||
|
|
||||||
|
# Wave execution
|
||||||
|
waves:
|
||||||
|
max_parallel_stories: 4 # Max stories per wave
|
||||||
|
agent_timeout: 3600 # 1 hour per agent
|
||||||
|
|
||||||
|
# Multi-agent validation
|
||||||
|
validation:
|
||||||
|
enabled: true
|
||||||
|
agents:
|
||||||
|
builder: {steps: [1,2,3,4]}
|
||||||
|
inspector: {steps: [5,6], fresh_context: true}
|
||||||
|
reviewer: {steps: [7], adversarial: true}
|
||||||
|
fixer: {steps: [8,9]}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Enhanced: `super-dev-pipeline-v2.md`
|
||||||
|
```yaml
|
||||||
|
execution_mode: "multi_agent" # single_agent | multi_agent
|
||||||
|
|
||||||
|
# Agent configuration
|
||||||
|
agents:
|
||||||
|
builder:
|
||||||
|
steps: [1, 2, 3, 4]
|
||||||
|
description: "Implement story"
|
||||||
|
|
||||||
|
inspector:
|
||||||
|
steps: [5, 6]
|
||||||
|
description: "Validate implementation"
|
||||||
|
fresh_context: true
|
||||||
|
|
||||||
|
reviewer:
|
||||||
|
steps: [7]
|
||||||
|
description: "Adversarial code review"
|
||||||
|
fresh_context: true
|
||||||
|
adversarial: true
|
||||||
|
|
||||||
|
fixer:
|
||||||
|
steps: [8, 9]
|
||||||
|
description: "Fix review issues"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Phases
|
||||||
|
|
||||||
|
### Phase 1: Multi-Agent Validation (Week 1)
|
||||||
|
- Update `super-dev-pipeline` to support multi-agent mode
|
||||||
|
- Create `agents/` directory with agent prompts
|
||||||
|
- Add agent tracking infrastructure
|
||||||
|
- Test on single story
|
||||||
|
|
||||||
|
### Phase 2: Wave-Based Execution (Week 2)
|
||||||
|
- Add dependency analysis to `batch-super-dev`
|
||||||
|
- Implement wave grouping logic
|
||||||
|
- Add parallel execution within waves
|
||||||
|
- Test on Epic 17 (10 stories)
|
||||||
|
|
||||||
|
### Phase 3: Checkpoint Segmentation (Week 3)
|
||||||
|
- Add checkpoint detection to stories
|
||||||
|
- Implement segment routing logic
|
||||||
|
- Test with stories that need user input
|
||||||
|
|
||||||
|
### Phase 4: Agent Resume (Week 4)
|
||||||
|
- Add agent history tracking
|
||||||
|
- Implement resume capability
|
||||||
|
- Test interrupted session recovery
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Benefits Summary
|
||||||
|
|
||||||
|
**From BMAD:**
|
||||||
|
- ✅ Comprehensive story tracking
|
||||||
|
- ✅ Sprint artifacts and status management
|
||||||
|
- ✅ Gap analysis and reconciliation
|
||||||
|
- ✅ Hospital-grade quality standards
|
||||||
|
- ✅ Multi-tenant patterns
|
||||||
|
|
||||||
|
**From GSD:**
|
||||||
|
- ✅ Wave-based parallelization (57% faster)
|
||||||
|
- ✅ Smart checkpoint routing
|
||||||
|
- ✅ Agent tracking and resume
|
||||||
|
- ✅ Lightweight orchestration
|
||||||
|
- ✅ Separation of concerns
|
||||||
|
|
||||||
|
**New in GSDMAD:**
|
||||||
|
- ✅ Multi-agent validation (no conflict of interest)
|
||||||
|
- ✅ Adversarial code review (finds real issues)
|
||||||
|
- ✅ Independent verification (honest reporting)
|
||||||
|
- ✅ Parallel story execution (faster completion)
|
||||||
|
- ✅ Best of both worlds
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration Path
|
||||||
|
|
||||||
|
1. **Keep BMAD v1.x** as fallback (`execution_mode: "single_agent"`)
|
||||||
|
2. **Add GSDMAD** as opt-in (`execution_mode: "multi_agent"`)
|
||||||
|
3. **Test both modes** on same epic, compare results
|
||||||
|
4. **Make GSDMAD default** after validation
|
||||||
|
5. **Deprecate v1.x** in 6 months
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Key Insight:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest. Stories run in parallel when possible, sequential only when dependencies require it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Next Steps:**
|
||||||
|
1. Create `super-dev-pipeline-v2/` directory
|
||||||
|
2. Write agent prompt files
|
||||||
|
3. Update `batch-super-dev` for wave execution
|
||||||
|
4. Test on Epic 17 stories
|
||||||
|
5. Measure time savings and quality improvements
|
||||||
|
|
@ -0,0 +1,291 @@
|
||||||
|
# Super-Dev-Pipeline: Multi-Agent Architecture
|
||||||
|
|
||||||
|
**Version:** 2.0.0
|
||||||
|
**Date:** 2026-01-25
|
||||||
|
**Author:** BMAD Method
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Problem with Single-Agent Execution
|
||||||
|
|
||||||
|
**Previous Architecture (v1.x):**
|
||||||
|
```
|
||||||
|
One Task Agent runs ALL 11 steps:
|
||||||
|
├─ Step 1: Init
|
||||||
|
├─ Step 2: Pre-Gap Analysis
|
||||||
|
├─ Step 3: Write Tests
|
||||||
|
├─ Step 4: Implement
|
||||||
|
├─ Step 5: Post-Validation ← Agent validates its OWN work
|
||||||
|
├─ Step 6: Quality Checks
|
||||||
|
├─ Step 7: Code Review ← Agent reviews its OWN code
|
||||||
|
├─ Step 8: Review Analysis
|
||||||
|
├─ Step 9: Fix Issues
|
||||||
|
├─ Step 10: Complete
|
||||||
|
└─ Step 11: Summary
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fatal Flaw:** Agent has conflict of interest - it validates and reviews its own work. When agents get tired/lazy, they lie about completion and skip steps.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## New Multi-Agent Architecture (v2.0)
|
||||||
|
|
||||||
|
**Principle:** **Separation of Concerns with Independent Validation**
|
||||||
|
|
||||||
|
Each phase has a DIFFERENT agent with fresh context:
|
||||||
|
|
||||||
|
```
|
||||||
|
┌────────────────────────────────────────────────────────────────┐
|
||||||
|
│ PHASE 1: IMPLEMENTATION (Agent 1 - "Builder") │
|
||||||
|
├────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Step 1: Init │
|
||||||
|
│ Step 2: Pre-Gap Analysis │
|
||||||
|
│ Step 3: Write Tests │
|
||||||
|
│ Step 4: Implement │
|
||||||
|
│ │
|
||||||
|
│ Output: Code written, tests written, claims "done" │
|
||||||
|
│ ⚠️ DO NOT TRUST - needs external validation │
|
||||||
|
└────────────────────────────────────────────────────────────────┘
|
||||||
|
↓
|
||||||
|
┌────────────────────────────────────────────────────────────────┐
|
||||||
|
│ PHASE 2: VALIDATION (Agent 2 - "Inspector") │
|
||||||
|
├────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Step 5: Post-Validation │
|
||||||
|
│ - Fresh context, no knowledge of Agent 1 │
|
||||||
|
│ - Verifies files actually exist │
|
||||||
|
│ - Verifies tests actually run and pass │
|
||||||
|
│ - Verifies checkboxes are checked in story file │
|
||||||
|
│ - Verifies sprint-status.yaml updated │
|
||||||
|
│ │
|
||||||
|
│ Step 6: Quality Checks │
|
||||||
|
│ - Run type-check, lint, build │
|
||||||
|
│ - Verify ZERO errors │
|
||||||
|
│ - Check git status (uncommitted files?) │
|
||||||
|
│ │
|
||||||
|
│ Output: PASS/FAIL verdict (honest assessment) │
|
||||||
|
│ ✅ Agent 2 has NO incentive to lie │
|
||||||
|
└────────────────────────────────────────────────────────────────┘
|
||||||
|
↓
|
||||||
|
┌────────────────────────────────────────────────────────────────┐
|
||||||
|
│ PHASE 3: CODE REVIEW (Agent 3 - "Adversarial Reviewer") │
|
||||||
|
├────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Step 7: Code Review (Multi-Agent) │
|
||||||
|
│ - Fresh context, ADVERSARIAL stance │
|
||||||
|
│ - Goal: Find problems, not rubber-stamp │
|
||||||
|
│ - Spawns 2-6 review agents (based on complexity) │
|
||||||
|
│ - Each reviewer has specific focus area │
|
||||||
|
│ │
|
||||||
|
│ Output: List of issues (security, performance, bugs) │
|
||||||
|
│ ✅ Adversarial agents WANT to find problems │
|
||||||
|
└────────────────────────────────────────────────────────────────┘
|
||||||
|
↓
|
||||||
|
┌────────────────────────────────────────────────────────────────┐
|
||||||
|
│ PHASE 4: FIX ISSUES (Agent 4 - "Fixer") │
|
||||||
|
├────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Step 8: Review Analysis │
|
||||||
|
│ - Categorize findings (MUST FIX, SHOULD FIX, NICE TO HAVE) │
|
||||||
|
│ - Filter out gold-plating │
|
||||||
|
│ │
|
||||||
|
│ Step 9: Fix Issues │
|
||||||
|
│ - Implement MUST FIX items │
|
||||||
|
│ - Implement SHOULD FIX if time allows │
|
||||||
|
│ │
|
||||||
|
│ Output: Fixed code, re-run tests │
|
||||||
|
└────────────────────────────────────────────────────────────────┘
|
||||||
|
↓
|
||||||
|
┌────────────────────────────────────────────────────────────────┐
|
||||||
|
│ PHASE 5: COMPLETION (Main Orchestrator - Claude) │
|
||||||
|
├────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Step 10: Complete │
|
||||||
|
│ - Verify git commits exist │
|
||||||
|
│ - Verify tests pass │
|
||||||
|
│ - Verify story checkboxes checked │
|
||||||
|
│ - Verify sprint-status updated │
|
||||||
|
│ - REJECT if any verification fails │
|
||||||
|
│ │
|
||||||
|
│ Step 11: Summary │
|
||||||
|
│ - Generate audit trail │
|
||||||
|
│ - Report to user │
|
||||||
|
│ │
|
||||||
|
│ ✅ Main orchestrator does FINAL verification │
|
||||||
|
└────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Agent Responsibilities
|
||||||
|
|
||||||
|
### Agent 1: Builder (Implementation)
|
||||||
|
- **Role:** Implement the story according to requirements
|
||||||
|
- **Trust Level:** LOW - assumes agent will cut corners
|
||||||
|
- **Output:** Code + tests (unverified)
|
||||||
|
- **Incentive:** Get done quickly → may lie about completion
|
||||||
|
|
||||||
|
### Agent 2: Inspector (Validation)
|
||||||
|
- **Role:** Independent verification of Agent 1's claims
|
||||||
|
- **Trust Level:** MEDIUM - no conflict of interest
|
||||||
|
- **Checks:**
|
||||||
|
- Do files actually exist?
|
||||||
|
- Do tests actually pass (run them myself)?
|
||||||
|
- Are checkboxes actually checked?
|
||||||
|
- Is sprint-status actually updated?
|
||||||
|
- **Output:** PASS/FAIL with evidence
|
||||||
|
- **Incentive:** Find truth → honest assessment
|
||||||
|
|
||||||
|
### Agent 3: Adversarial Reviewer (Code Review)
|
||||||
|
- **Role:** Find problems with the implementation
|
||||||
|
- **Trust Level:** HIGH - WANTS to find issues
|
||||||
|
- **Focus Areas:**
|
||||||
|
- Security vulnerabilities
|
||||||
|
- Performance problems
|
||||||
|
- Logic bugs
|
||||||
|
- Architecture violations
|
||||||
|
- **Output:** List of issues with severity
|
||||||
|
- **Incentive:** Find as many legitimate issues as possible
|
||||||
|
|
||||||
|
### Agent 4: Fixer (Issue Resolution)
|
||||||
|
- **Role:** Fix issues identified by Agent 3
|
||||||
|
- **Trust Level:** MEDIUM - has incentive to minimize work
|
||||||
|
- **Actions:**
|
||||||
|
- Implement MUST FIX issues
|
||||||
|
- Implement SHOULD FIX issues (if time)
|
||||||
|
- Skip NICE TO HAVE (gold-plating)
|
||||||
|
- **Output:** Fixed code
|
||||||
|
|
||||||
|
### Main Orchestrator: Claude (Final Verification)
|
||||||
|
- **Role:** Final quality gate before marking story complete
|
||||||
|
- **Trust Level:** HIGHEST - user-facing, no incentive to lie
|
||||||
|
- **Checks:**
|
||||||
|
- Git log shows commits
|
||||||
|
- Test output shows passing tests
|
||||||
|
- Story file diff shows checked boxes
|
||||||
|
- Sprint-status diff shows update
|
||||||
|
- **Output:** COMPLETE or FAILED (with specific reason)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation in workflow.yaml
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# New execution mode (v2.0)
|
||||||
|
execution_mode: "multi_agent" # single_agent | multi_agent
|
||||||
|
|
||||||
|
# Agent configuration
|
||||||
|
agents:
|
||||||
|
builder:
|
||||||
|
steps: [1, 2, 3, 4]
|
||||||
|
subagent_type: "general-purpose"
|
||||||
|
description: "Implement story {{story_key}}"
|
||||||
|
|
||||||
|
inspector:
|
||||||
|
steps: [5, 6]
|
||||||
|
subagent_type: "general-purpose"
|
||||||
|
description: "Validate story {{story_key}} implementation"
|
||||||
|
fresh_context: true # No knowledge of builder agent
|
||||||
|
|
||||||
|
reviewer:
|
||||||
|
steps: [7]
|
||||||
|
subagent_type: "multi-agent-review" # Spawns multiple reviewers
|
||||||
|
description: "Adversarial review of story {{story_key}}"
|
||||||
|
fresh_context: true
|
||||||
|
adversarial: true
|
||||||
|
|
||||||
|
fixer:
|
||||||
|
steps: [8, 9]
|
||||||
|
subagent_type: "general-purpose"
|
||||||
|
description: "Fix issues in story {{story_key}}"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification Checklist (Step 10)
|
||||||
|
|
||||||
|
**Main orchestrator MUST verify before marking complete:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Check git commits
|
||||||
|
git log --oneline -3 | grep "{{story_key}}"
|
||||||
|
# FAIL if no commit found
|
||||||
|
|
||||||
|
# 2. Check story checkboxes
|
||||||
|
before_count=$(git show HEAD~1:{{story_file}} | grep -c "^- \[x\]")
|
||||||
|
after_count=$(grep -c "^- \[x\]" {{story_file}})
|
||||||
|
# FAIL if after_count <= before_count
|
||||||
|
|
||||||
|
# 3. Check sprint-status
|
||||||
|
git diff HEAD~1 {{sprint_status}} | grep "{{story_key}}"
|
||||||
|
# FAIL if no status change
|
||||||
|
|
||||||
|
# 4. Check test results
|
||||||
|
# Parse agent output for "PASS" or test count
|
||||||
|
# FAIL if no test evidence
|
||||||
|
```
|
||||||
|
|
||||||
|
**If ANY check fails → Story NOT complete, report to user**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Benefits of Multi-Agent Architecture
|
||||||
|
|
||||||
|
1. **Separation of Concerns**
|
||||||
|
- Implementation separate from validation
|
||||||
|
- Review separate from fixing
|
||||||
|
|
||||||
|
2. **No Conflict of Interest**
|
||||||
|
- Validators have no incentive to lie
|
||||||
|
- Reviewers WANT to find problems
|
||||||
|
|
||||||
|
3. **Fresh Context Each Phase**
|
||||||
|
- Inspector doesn't know what Builder did
|
||||||
|
- Reviewer approaches code with fresh eyes
|
||||||
|
|
||||||
|
4. **Honest Reporting**
|
||||||
|
- Each agent reports truthfully
|
||||||
|
- Main orchestrator verifies everything
|
||||||
|
|
||||||
|
5. **Catches Lazy Agents**
|
||||||
|
- Can't lie about completion
|
||||||
|
- Can't skip validation
|
||||||
|
- Can't rubber-stamp reviews
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration from v1.x to v2.0
|
||||||
|
|
||||||
|
**Backward Compatibility:**
|
||||||
|
- Keep `execution_mode: "single_agent"` as fallback
|
||||||
|
- Default to `execution_mode: "multi_agent"` for new workflows
|
||||||
|
|
||||||
|
**Testing:**
|
||||||
|
- Run both modes on same story
|
||||||
|
- Compare results (multi-agent should catch more issues)
|
||||||
|
|
||||||
|
**Rollout:**
|
||||||
|
- Phase 1: Add multi-agent option
|
||||||
|
- Phase 2: Make multi-agent default
|
||||||
|
- Phase 3: Deprecate single-agent mode
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Future Enhancements (v2.1+)
|
||||||
|
|
||||||
|
1. **Agent Reputation Tracking**
|
||||||
|
- Track which agents produce reliable results
|
||||||
|
- Penalize agents that consistently lie
|
||||||
|
|
||||||
|
2. **Dynamic Agent Selection**
|
||||||
|
- Choose different review agents based on story type
|
||||||
|
- Security-focused reviewers for auth stories
|
||||||
|
- Performance reviewers for database stories
|
||||||
|
|
||||||
|
3. **Parallel Validation**
|
||||||
|
- Run multiple validators simultaneously
|
||||||
|
- Require consensus (2/3 validators agree)
|
||||||
|
|
||||||
|
4. **Agent Learning**
|
||||||
|
- Validators learn common failure patterns
|
||||||
|
- Reviewers learn project-specific issues
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Key Takeaway:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest.
|
||||||
Loading…
Reference in New Issue