From a268b4c1bc314ee3174b3631bdb21fe4599c9595 Mon Sep 17 00:00:00 2001 From: Jonah Schulte Date: Wed, 28 Jan 2026 13:28:37 -0500 Subject: [PATCH] feat: upgrade story-full-pipeline to v4.0 with 6 major enhancements MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Upgrade from v3.2.0 to v4.0.0 with improvements inspired by CooperBench research (Stanford/SAP 2026) on agent coordination failures. Enhancement 1: Resume Builder (v3.2+) - Phase 3 RESUMES Builder agent with review findings - Builder already has full codebase context (50-70% token savings) - More efficient than spawning fresh Fixer agent Enhancement 2: Inspector Code Citations (v4.0) - Inspector must map EVERY task to file:line citations - Example: "Create component" → "src/Component.tsx:45-67" - No more "trust me, it works" - requires proof - Returns structured JSON with code evidence per task - Prevents vague communication (CooperBench finding) Enhancement 3: Remove Hospital-Grade Framing (v4.0) - Dropped psychological appeal language - Kept rigorous verification gates and bash checks - Focus on concrete, measurable verification - Replaced with patterns/verification.md + patterns/tdd.md Enhancement 4: Micro Stories Get Security Scan (v4.0) - No longer skip ALL review for micro stories - Micro now gets 2 reviewers: Security + Architect - Lightweight but still catches critical vulnerabilities Enhancement 5: Test Quality Agent + Coverage Gate (v4.0) - New Test Quality Agent validates: - Edge cases covered (null, empty, invalid) - Error conditions tested - Meaningful assertions (not just "doesn't crash") - No flaky tests (random data, timing) - Automated Coverage Gate enforces 80% threshold - Builder must fix test gaps before proceeding Enhancement 6: Playbook Learning System (v4.0) - Phase 0: Query playbooks before implementation - Builder gets relevant patterns/gotchas upfront - Phase 6: Reflection agent extracts learnings - Auto-generates playbook updates for future agents - Bootstrap mode: auto-initializes playbooks if missing - Continuous improvement through reflection Pipeline: Phase 0 (Playbooks) → Phase 1 (Builder) → Phase 2 (Inspector + Test Quality + Reviewers parallel) → Phase 2.5 (Coverage Gate) → Phase 3 (Resume Builder) → Phase 4 (Inspector recheck) → Phase 5 (Reconciliation) → Phase 6 (Reflection) Files Modified: - workflow.yaml: v4.0 config with playbooks + quality_gates - workflow.md: Complete v4.0 documentation with all phases - agents/builder.md: Playbook awareness + structured JSON - agents/inspector.md: Code citation requirements + evidence format - agents/reviewer.md: Remove hospital-grade reference - agents/architect-integration-reviewer.md: Remove hospital-grade reference - agents/fixer.md: Remove hospital-grade reference - README.md: v4.0 documentation + CooperBench analysis Files Created: - agents/test-quality.md: Test quality validation agent - agents/reflection.md: Playbook learning agent - ../templates/implementation-playbook-template.md: Simple playbook structure Design Philosophy: The workflow avoids CooperBench's "curse of coordination" by using: - Sequential implementation (ONE writer, no merge conflicts) - Parallel verification (safe read-only validation) - Context reuse (no expectation failures) - Evidence-based communication (file:line citations) - Clear role separation (no overlapping responsibilities) --- .../story-full-pipeline/README.md | 218 ++--- .../agents/architect-integration-reviewer.md | 1 - .../story-full-pipeline/agents/builder.md | 77 +- .../story-full-pipeline/agents/fixer.md | 1 - .../story-full-pipeline/agents/inspector.md | 173 ++-- .../story-full-pipeline/agents/reflection.md | 93 +++ .../story-full-pipeline/agents/reviewer.md | 1 - .../agents/test-quality.md | 73 ++ .../story-full-pipeline/workflow.md | 752 ++++++++++++++++-- .../story-full-pipeline/workflow.yaml | 45 +- .../implementation-playbook-template.md | 85 ++ 11 files changed, 1189 insertions(+), 330 deletions(-) create mode 100644 src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md create mode 100644 src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md create mode 100644 src/bmm/workflows/templates/implementation-playbook-template.md diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/README.md b/src/bmm/workflows/4-implementation/story-full-pipeline/README.md index a436933f..089a34ee 100644 --- a/src/bmm/workflows/4-implementation/story-full-pipeline/README.md +++ b/src/bmm/workflows/4-implementation/story-full-pipeline/README.md @@ -1,124 +1,150 @@ -# Super-Dev Pipeline - GSDMAD Architecture +# Story-Full-Pipeline v4.0 -**Multi-agent pipeline with independent validation and adversarial code review** +Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes. ---- +## What's New in v4.0 -## Quick Start +### 1. Resume Builder (v3.2+) +**Token Efficiency: 50-70% savings** -```bash -# Run super-dev pipeline for a story -/story-full-pipeline story_key=17-10 +- Phase 3 now RESUMES Builder agent with review findings +- Builder already has full codebase context +- More efficient than spawning fresh Fixer agent + +### 2. Inspector Code Citations (v4.0) +**Evidence-Based Verification** + +- Inspector must map EVERY task to file:line citations +- Example: "Create component" → "src/Component.tsx:45-67" +- No more "trust me, it works" - requires proof +- Returns structured JSON with code evidence per task + +### 3. Remove Hospital-Grade Framing (v4.0) +**Focus on Concrete Verification** + +- Dropped psychological appeal language +- Kept rigorous verification gates and bash checks +- Replaced with patterns/verification.md + patterns/tdd.md + +### 4. Micro Stories Get Security Scan (v4.0) +**Even Simple Stories Need Security** + +- No longer skip ALL review for micro stories +- Still get 2 reviewers: Security + Architect +- Lightweight but catches critical vulnerabilities + +### 5. Test Quality Agent + Coverage Gate (v4.0) +**Validate Test Completeness** + +- New Test Quality Agent validates: + - Edge cases covered (null, empty, invalid) + - Error conditions tested + - Meaningful assertions (not just "doesn't crash") + - No flaky tests (random data, timing) +- Automated Coverage Gate enforces 80% threshold +- Builder must fix test gaps before proceeding + +### 6. Playbook Learning System (v4.0) +**Continuous Improvement Through Reflection** + +- **Phase 0:** Query playbooks before implementation +- Builder gets relevant patterns/gotchas upfront +- **Phase 6:** Reflection agent extracts learnings +- Auto-generates playbook updates for future agents +- Bootstrap mode: auto-initializes playbooks if missing + +## Pipeline Flow + +``` +Phase 0: Playbook Query (orchestrator) + ↓ +Phase 1: Builder (initial implementation) + ↓ +Phase 2: Inspector + Test Quality + N Reviewers (parallel) + ↓ +Phase 2.5: Coverage Gate (automated) + ↓ +Phase 3: Resume Builder (fix issues with context) + ↓ +Phase 4: Inspector re-check (quick verification) + ↓ +Phase 5: Orchestrator reconciliation (evidence-based) + ↓ +Phase 6: Playbook Reflection (extract learnings) ``` ---- +## Complexity Routing -## Architecture +| Complexity | Phase 2 Agents | Total | Reviewers | +|------------|----------------|-------|-----------| +| micro | Inspector + Test Quality + 2 | 4 agents | Security + Architect | +| standard | Inspector + Test Quality + 3 | 5 agents | Security + Logic + Architect | +| complex | Inspector + Test Quality + 4 | 6 agents | Security + Logic + Architect + Quality | -### Multi-Agent Validation -- **4 independent agents** working sequentially -- Builder → Inspector → Reviewer → Fixer -- Each agent has fresh context -- No conflict of interest +## Quality Gates -### Honest Reporting -- Inspector verifies Builder's work (doesn't trust claims) -- Reviewer is adversarial (wants to find issues) -- Main orchestrator does final verification -- Can't fake completion +- **Coverage Threshold:** 80% line coverage required +- **Task Verification:** ALL tasks need file:line evidence +- **Critical Issues:** MUST fix +- **High Issues:** MUST fix -### Wave-Based Execution -- Independent stories run in parallel -- Dependencies respected via waves -- 57% faster than sequential execution +## Token Efficiency ---- +- Phase 2 agents spawn in parallel (same cost, faster) +- Phase 3 resumes Builder (50-70% token savings vs fresh agent) +- Phase 4 Inspector only (no full re-review) -## Workflow Phases +## Playbook Configuration -**Phase 1: Builder (Steps 1-4)** -- Load story, analyze gaps -- Write tests (TDD) -- Implement code -- Report what was built (NO VALIDATION) +```yaml +playbooks: + enabled: true + directory: "docs/playbooks/implementation-playbooks" + bootstrap_mode: true # Auto-initialize if missing + max_load: 3 + auto_apply_updates: false # Require manual review + discovery: + enabled: true + sources: ["git_history", "docs", "existing_code"] +``` -**Phase 2: Inspector (Steps 5-6)** -- Fresh context, no Builder knowledge -- Verify files exist -- Run tests independently -- Run quality checks -- PASS or FAIL verdict +## How It Avoids CooperBench Coordination Failures -**Phase 3: Reviewer (Step 7)** -- Fresh context, adversarial stance -- Find security vulnerabilities -- Find performance problems -- Find logic bugs -- Report issues with severity +Unlike the multi-agent coordination failures documented in CooperBench (Stanford/SAP 2026): -**Phase 4: Fixer (Steps 8-9)** -- Fix CRITICAL issues (all) -- Fix HIGH issues (all) -- Fix MEDIUM issues (time permitting) -- Verify fixes independently +1. **Sequential Implementation** - ONE Builder agent implements entire story (no parallel implementation conflicts) +2. **Parallel Review** - Multiple agents review in parallel (safe read-only operations) +3. **Context Reuse** - SAME agent fixes issues (no expectation failures about partner state) +4. **Evidence-Based** - file:line citations prevent vague communication +5. **Clear Roles** - Builder writes, reviewers validate (no overlapping responsibilities) -**Phase 5: Final Verification** -- Main orchestrator verifies all phases -- Updates story checkboxes -- Creates commit -- Marks story complete - ---- - -## Key Features - -**Separation of Concerns:** -- Builder focuses only on implementation -- Inspector focuses only on validation -- Reviewer focuses only on finding issues -- Fixer focuses only on resolving issues - -**Independent Validation:** -- Each agent validates the previous agent's work -- No agent validates its own work -- Fresh context prevents confirmation bias - -**Quality Enforcement:** -- Multiple quality gates throughout pipeline -- Can't proceed without passing validation -- 95% honesty rate (agents can't fake completion) - ---- +The workflow uses agents for **verification parallelism**, not **implementation parallelism** - avoiding the "curse of coordination." ## Files -See `workflow.md` for complete architecture details. - **Agent Prompts:** -- `agents/builder.md` - Implementation agent -- `agents/inspector.md` - Validation agent +- `agents/builder.md` - Implementation agent (with playbook awareness) +- `agents/inspector.md` - Validation agent (requires code citations) +- `agents/test-quality.md` - Test quality validation (v4.0) - `agents/reviewer.md` - Adversarial review agent -- `agents/fixer.md` - Issue resolution agent +- `agents/architect-integration-reviewer.md` - Architecture/integration review +- `agents/fixer.md` - Issue resolution agent (deprecated, uses resume Builder) +- `agents/reflection.md` - Playbook learning agent (v4.0) **Workflow Config:** -- `workflow.yaml` - Main configuration -- `workflow.md` - Complete documentation +- `workflow.yaml` - Main configuration (v4.0) +- `workflow.md` - Complete step-by-step documentation -**Directory Structure:** -``` -story-full-pipeline/ -├── README.md (this file) -├── workflow.yaml (configuration) -├── workflow.md (complete documentation) -├── agents/ -│ ├── builder.md (implementation agent prompt) -│ ├── inspector.md (validation agent prompt) -│ ├── reviewer.md (review agent prompt) -│ └── fixer.md (fix agent prompt) -└── steps/ - └── (step files for each phase) +**Templates:** +- `../templates/implementation-playbook-template.md` - Playbook structure + +## Usage + +```bash +# Run story-full-pipeline +/story-full-pipeline story_key=17-10 ``` ---- +## Backward Compatibility -**Philosophy:** Trust but verify. Every agent's work is independently validated by a fresh agent with no conflict of interest. +Falls back to single-agent mode if multi-agent execution fails. diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md index 17e099dd..f1cd8442 100644 --- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md +++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/architect-integration-reviewer.md @@ -5,7 +5,6 @@ **Trust Level:** HIGH (wants to find integration issues) -@patterns/hospital-grade.md @patterns/agent-completion.md diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md index bcbc8cf5..2131dad6 100644 --- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md +++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/builder.md @@ -5,7 +5,6 @@ **Trust Level:** LOW (assume will cut corners) -@patterns/hospital-grade.md @patterns/tdd.md @patterns/agent-completion.md @@ -17,11 +16,12 @@ You are the **BUILDER** agent. Your job is to implement the story requirements by writing production code and tests. **DO:** +- **Review playbooks** for gotchas and patterns (if provided) - Load and understand the story requirements - Analyze what exists vs what's needed - Write tests first (TDD approach) - Implement production code to make tests pass -- Follow project patterns and conventions +- Follow project patterns and playbook guidance **DO NOT:** - Validate your own work (Inspector agent will do this) @@ -35,7 +35,8 @@ You are the **BUILDER** agent. Your job is to implement the story requirements b ## Steps to Execute ### Step 1: Initialize -Load story file and cache context: +Load story file and playbooks (if provided): +- **Review playbooks first** (if provided in context) - note gotchas and patterns - Read story file: `{{story_file}}` - Parse all sections (Business Context, Acceptance Criteria, Tasks, etc.) - Determine greenfield vs brownfield @@ -88,54 +89,36 @@ When complete, provide: --- -## Hospital-Grade Standards +## Completion Format (v4.0) -⚕️ **Quality >> Speed** +**Return structured JSON artifact:** -- Take time to do it right -- Don't skip error handling -- Don't leave TODO comments -- Don't use `any` types - ---- - -## When Complete, Return This Format - -```markdown -## AGENT COMPLETE - -**Agent:** builder -**Story:** {{story_key}} -**Status:** SUCCESS | FAILED - -### Files Created -- path/to/new/file1.ts -- path/to/new/file2.ts - -### Files Modified -- path/to/existing/file.ts - -### Tests Added -- X test files -- Y test cases total - -### Implementation Summary -Brief description of what was built and key decisions made. - -### Known Gaps -- Any functionality not implemented -- Any edge cases not handled -- NONE if all tasks complete - -### Ready For -Inspector validation (next phase) +```json +{ + "agent": "builder", + "story_key": "{{story_key}}", + "status": "SUCCESS", + "files_created": ["path/to/file.tsx", "path/to/file.test.tsx"], + "files_modified": ["path/to/existing.tsx"], + "tests_added": { + "total": 12, + "passing": 12 + }, + "tasks_addressed": [ + "Create agreement view component", + "Add status badge", + "Implement occupant selection" + ], + "playbooks_reviewed": ["database-patterns.md", "api-security.md"] +} ``` -**Why this format?** The orchestrator parses this output to: -- Verify claimed files actually exist -- Track what was built for reconciliation -- Route to next phase appropriately +**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-builder.json` --- -**Remember:** You are the BUILDER. Build it well, but don't validate or review your own work. Other agents will do that with fresh eyes. +**Remember:** + +- **Review playbooks first** if provided - they contain gotchas and patterns learned from previous stories +- Build it well with TDD, but don't validate or review your own work +- Other agents will verify with fresh eyes and provide file:line evidence diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md index 968572fd..165c9821 100644 --- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md +++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/fixer.md @@ -5,7 +5,6 @@ **Trust Level:** MEDIUM (incentive to minimize work) -@patterns/hospital-grade.md @patterns/agent-completion.md diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md index 968afb93..141ce651 100644 --- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md +++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/inspector.md @@ -1,12 +1,11 @@ -# Inspector Agent - Validation Phase +# Inspector Agent - Validation Phase with Code Citations -**Role:** Independent verification of Builder's work +**Role:** Independent verification of Builder's work **WITH EVIDENCE** **Steps:** 5-6 (post-validation, quality-checks) **Trust Level:** MEDIUM (no conflict of interest) @patterns/verification.md -@patterns/hospital-grade.md @patterns/agent-completion.md @@ -14,48 +13,54 @@ ## Your Mission -You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed. +You are the **INSPECTOR** agent. Your job is to verify that the Builder actually did what they claimed **and provide file:line evidence for every task**. **KEY PRINCIPLE: You have NO KNOWLEDGE of what the Builder did. You are starting fresh.** +**CRITICAL REQUIREMENT v4.0: EVERY task must have code citations.** + **DO:** +- Map EACH task to specific code with file:line citations - Verify files actually exist - Run tests yourself (don't trust claims) - Run quality checks (type-check, lint, build) -- Give honest PASS/FAIL verdict +- Provide evidence for EVERY task **DO NOT:** -- Take the Builder's word for anything -- Skip verification steps +- Skip any task verification +- Give vague "looks good" without citations - Assume tests pass without running them -- Give PASS verdict if ANY check fails +- Give PASS verdict if ANY check fails or task lacks evidence --- ## Steps to Execute -### Step 5: Post-Validation +### Step 5: Task Verification with Code Citations -**Verify Implementation Against Story:** +**Map EVERY task to specific code locations:** -1. **Check Files Exist:** - ```bash - # For each file mentioned in story tasks - ls -la {{file_path}} - # FAIL if file missing or empty - ``` +1. **Read story file** - understand ALL tasks -2. **Verify File Contents:** - - Open each file - - Check it has actual code (not just TODO/stub) - - Verify it matches story requirements +2. **For EACH task, provide:** + - **file:line** where it's implemented + - **Brief quote** of relevant code + - **Verdict:** IMPLEMENTED or NOT_IMPLEMENTED -3. **Check Tests Exist:** - ```bash - # Find test files - find . -name "*.test.ts" -o -name "__tests__" - # FAIL if no tests found for new code - ``` +**Example Evidence Format:** + +``` +Task: "Display occupant agreement status" +Evidence: src/features/agreement/StatusBadge.tsx:45-67 +Code: "const StatusBadge = ({ status }) => ..." +Verdict: IMPLEMENTED +``` + +3. **If task NOT implemented:** + - Explain why (file missing, code incomplete, etc.) + - Provide file:line where it should be + +**CRITICAL:** If you can't cite file:line, mark as NOT_IMPLEMENTED. ### Step 6: Quality Checks @@ -96,36 +101,49 @@ You are the **INSPECTOR** agent. Your job is to verify that the Builder actually --- -## Output Requirements +## Completion Format (v4.0) -**Provide Evidence-Based Verdict:** +**Return structured JSON with code citations:** -### If PASS: -```markdown -✅ VALIDATION PASSED - -Evidence: -- Files verified: [list files checked] -- Type check: PASS (0 errors) -- Linter: PASS (0 warnings) -- Build: PASS -- Tests: 45/45 passing (95% coverage) -- Git: 12 files modified, 3 new files - -Ready for code review. +```json +{ + "agent": "inspector", + "story_key": "{{story_key}}", + "verdict": "PASS", + "task_verification": [ + { + "task": "Create agreement view component", + "implemented": true, + "evidence": [ + { + "file": "src/features/agreement/AgreementView.tsx", + "lines": "15-67", + "code_snippet": "export const AgreementView = ({ agreementId }) => {...}" + }, + { + "file": "src/features/agreement/AgreementView.test.tsx", + "lines": "8-45", + "code_snippet": "describe('AgreementView', () => {...})" + } + ] + }, + { + "task": "Add status badge", + "implemented": false, + "evidence": [], + "reason": "No StatusBadge component found in src/features/agreement/" + } + ], + "checks": { + "type_check": {"passed": true, "errors": 0}, + "lint": {"passed": true, "warnings": 0}, + "tests": {"passed": true, "total": 12, "passing": 12}, + "build": {"passed": true} + } +} ``` -### If FAIL: -```markdown -❌ VALIDATION FAILED - -Failures: -1. File missing: app/api/occupant/agreement/route.ts -2. Type check: 3 errors in lib/api/auth.ts -3. Tests: 2 failing (api/occupant tests) - -Cannot proceed to code review until these are fixed. -``` +**Save to:** `docs/sprint-artifacts/completions/{{story_key}}-inspector.json` --- @@ -133,58 +151,15 @@ Cannot proceed to code review until these are fixed. **Before giving PASS verdict, confirm:** -- [ ] All story files exist and have content +- [ ] EVERY task has file:line citation or NOT_IMPLEMENTED reason - [ ] Type check returns 0 errors -- [ ] Linter returns 0 errors/warnings +- [ ] Linter returns 0 warnings - [ ] Build succeeds - [ ] Tests run and pass (not skipped) -- [ ] Test coverage >= 90% -- [ ] Git status is clean or has expected changes +- [ ] All implemented tasks have code evidence **If ANY checkbox is unchecked → FAIL verdict** --- -## Hospital-Grade Standards - -⚕️ **Be Thorough** - -- Don't skip checks -- Run tests yourself (don't trust claims) -- Verify every file exists -- Give specific evidence - ---- - -## When Complete, Return This Format - -```markdown -## AGENT COMPLETE - -**Agent:** inspector -**Story:** {{story_key}} -**Status:** PASS | FAIL - -### Evidence -- **Type Check:** PASS (0 errors) | FAIL (X errors) -- **Lint:** PASS (0 warnings) | FAIL (X warnings) -- **Build:** PASS | FAIL -- **Tests:** X passing, Y failing, Z% coverage - -### Files Verified -- path/to/file1.ts ✓ -- path/to/file2.ts ✓ -- path/to/missing.ts ✗ (NOT FOUND) - -### Failures (if FAIL status) -1. Specific failure with file:line reference -2. Another specific failure - -### Ready For -- If PASS: Reviewer (next phase) -- If FAIL: Builder needs to fix before proceeding -``` - ---- - -**Remember:** You are the INSPECTOR. Your job is to find the truth, not rubber-stamp the Builder's work. If something is wrong, say so with evidence. +**Remember:** You are the INSPECTOR. Your job is to find the truth with evidence, not rubber-stamp the Builder's work. If something is wrong, say so with file:line citations. diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md new file mode 100644 index 00000000..ebb816d7 --- /dev/null +++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reflection.md @@ -0,0 +1,93 @@ +# Reflection Agent - Playbook Learning + +You are the **REFLECTION** agent for story {{story_key}}. + +## Context + +- **Story:** {{story_file}} +- **Builder initial:** {{builder_artifact}} +- **All review findings:** {{all_reviewer_artifacts}} +- **Builder fixes:** {{builder_fixes_artifact}} +- **Test quality issues:** {{test_quality_artifact}} + +## Objective + +Identify what future agents should know: + +1. **What issues were found?** (from reviewers) +2. **What did Builder miss initially?** (gaps, edge cases, security) +3. **What playbook knowledge would have prevented these?** +4. **Which module/feature area does this apply to?** +5. **Should we update existing playbook or create new?** + +### Key Questions + +- What gotchas should future builders know? +- What code patterns should be standard? +- What test requirements are essential? +- What similar stories exist? + +## Success Criteria + +- [ ] Analyzed review findings +- [ ] Identified preventable issues +- [ ] Determined which playbook(s) to update +- [ ] Return structured proposal + +## Completion Format + +Return structured JSON artifact: + +```json +{ + "agent": "reflection", + "story_key": "{{story_key}}", + "learnings": [ + { + "issue": "SQL injection in query builder", + "root_cause": "Builder used string concatenation (didn't know pattern)", + "prevention": "Playbook should document: always use parameterized queries", + "applies_to": "database queries, API endpoints with user input" + }, + { + "issue": "Missing edge case tests for empty arrays", + "root_cause": "Test Quality Agent found gap", + "prevention": "Playbook should require: test null/empty/invalid for all inputs", + "applies_to": "all data processing functions" + } + ], + "playbook_proposal": { + "action": "update_existing", + "playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md", + "module": "api/database", + "updates": { + "common_gotchas": [ + "Never concatenate user input into SQL - use parameterized queries", + "Test edge cases: null, undefined, [], '', invalid input" + ], + "code_patterns": [ + "db.query(sql, [param1, param2]) ✓", + "sql + userInput ✗" + ], + "test_requirements": [ + "Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()", + "Test empty inputs: expect(fn([])).toHandle() or .toThrow()" + ], + "related_stories": ["{{story_key}}"] + } + } +} +``` + +Save to: `docs/sprint-artifacts/completions/{{story_key}}-reflection.json` + +## Playbook Structure + +When proposing playbook updates, structure them with these sections: + +1. **Common Gotchas** - What mistakes to avoid +2. **Code Patterns** - Standard approaches (with ✓ and ✗ examples) +3. **Test Requirements** - What tests are essential +4. **Related Stories** - Which stories used these patterns + +Keep it simple and actionable for future agents. diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md index 2a711e05..f857bfaa 100644 --- a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md +++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/reviewer.md @@ -6,7 +6,6 @@ @patterns/security-checklist.md -@patterns/hospital-grade.md @patterns/agent-completion.md diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md new file mode 100644 index 00000000..172ff9f6 --- /dev/null +++ b/src/bmm/workflows/4-implementation/story-full-pipeline/agents/test-quality.md @@ -0,0 +1,73 @@ +# Test Quality Agent + +You are the **TEST QUALITY** agent for story {{story_key}}. + +## Context + +- **Story:** {{story_file}} +- **Builder completion:** {{builder_completion_artifact}} + +## Objective + +Review test files for quality and completeness: + +1. Find all test files created/modified by Builder +2. For each test file, verify: + - **Happy path**: Primary functionality tested ✓ + - **Edge cases**: null, empty, invalid inputs ✓ + - **Error conditions**: Failures handled properly ✓ + - **Assertions**: Meaningful checks (not just "doesn't crash") + - **Test names**: Descriptive and clear + - **Deterministic**: No random data, no timing dependencies +3. Check that tests actually validate the feature + +**Focus on:** What's missing? What edge cases weren't considered? + +## Success Criteria + +- [ ] All test files reviewed +- [ ] Edge cases identified (covered or missing) +- [ ] Error conditions verified +- [ ] Assertions are meaningful +- [ ] Tests are deterministic +- [ ] Return quality assessment + +## Completion Format + +Return structured JSON artifact: + +```json +{ + "agent": "test_quality", + "story_key": "{{story_key}}", + "verdict": "PASS" | "NEEDS_IMPROVEMENT", + "test_files_reviewed": ["path/to/test.tsx", ...], + "issues": [ + { + "severity": "HIGH", + "file": "path/to/test.tsx:45", + "issue": "Missing edge case: empty input array", + "recommendation": "Add test: expect(fn([])).toThrow(...)" + }, + { + "severity": "MEDIUM", + "file": "path/to/test.tsx:67", + "issue": "Test uses Math.random() - could be flaky", + "recommendation": "Use fixed test data" + } + ], + "coverage_analysis": { + "edge_cases_covered": true, + "error_conditions_tested": true, + "meaningful_assertions": true, + "tests_are_deterministic": true + }, + "summary": { + "high_issues": 0, + "medium_issues": 0, + "low_issues": 0 + } +} +``` + +Save to: `docs/sprint-artifacts/completions/{{story_key}}-test-quality.json` diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md index 3f603f0e..49a1ad91 100644 --- a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md +++ b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.md @@ -1,74 +1,142 @@ -# Super-Dev-Pipeline v3.1 - Token-Efficient Multi-Agent Pipeline +# Story-Full-Pipeline v4.0 - Enhanced Multi-Agent Pipeline Implement a story using parallel verification agents with Builder context reuse. -Each agent has single responsibility. Builder fixes issues in its own context (50-70% token savings). -Orchestrator handles bookkeeping (story file updates, verification). +Enhanced with playbook learning, code citation evidence, test quality validation, and automated coverage gates. +Builder fixes issues in its own context (50-70% token savings). -**Token-Efficient Multi-Agent Pipeline** +**Quality Through Discipline, Continuous Learning** -- Builder implements (creative, context preserved) -- Inspector + Reviewers validate in parallel (verification, fresh context) -- Builder fixes issues (creative, reuses context - 50-70% token savings) -- Inspector re-checks (verification, quick check) -- Orchestrator reconciles story file (mechanical) +- Playbook Query: Load relevant patterns before starting +- Builder: Implements with playbook knowledge (context preserved) +- Inspector + Test Quality + Reviewers: Validate in parallel with proof +- Coverage Gate: Automated threshold enforcement +- Builder: Fixes issues in same context (50-70% token savings) +- Inspector: Quick recheck +- Orchestrator: Reconciles mechanically +- Reflection: Updates playbooks for future agents -**Key Innovation:** Resume Builder instead of spawning fresh Fixer. -Builder already knows the codebase - just needs to fix specific issues. - -Trust but verify. Fresh context for verification. Reuse context for fixes. +Trust but verify. Fresh context for verification. Evidence-based validation. Self-improving system. name: story-full-pipeline -version: 3.2.0 +version: 4.0.0 execution_mode: multi_agent phases: + phase_0: Playbook Query (orchestrator) phase_1: Builder (saves agent_id) - phase_2: [Inspector + N Reviewers] in parallel (N = 2/3/4 based on complexity) + phase_2: [Inspector + Test Quality + N Reviewers] in parallel + phase_2.5: Coverage Gate (automated) phase_3: Resume Builder with all findings (reuses context) phase_4: Inspector re-check (quick verification) phase_5: Orchestrator reconciliation + phase_6: Playbook Reflection reviewer_counts: - micro: 2 reviewers (security, architect/integration) v3.2.0+ - standard: 3 reviewers (security, logic/performance, architect/integration) v3.2.0+ - complex: 4 reviewers (security, logic, architect/integration, code quality) v3.2.0+ + micro: 2 reviewers (security, architect/integration) + standard: 3 reviewers (security, logic/performance, architect/integration) + complex: 4 reviewers (security, logic, architect/integration, code quality) + +quality_gates: + coverage_threshold: 80 # % line coverage required + task_verification: "all_with_evidence" # Inspector must cite file:line + critical_issues: "must_fix" + high_issues: "must_fix" token_efficiency: - Phase 2 agents spawn in parallel (same cost, faster) - - Phase 3 resumes Builder (50-70% token savings vs fresh Fixer) + - Phase 3 resumes Builder (50-70% token savings vs fresh agent) - Phase 4 Inspector only (no full re-review) + +playbooks: + enabled: true + directory: "docs/playbooks/implementation-playbooks" + max_load: 3 + auto_apply_updates: false -@patterns/hospital-grade.md +@patterns/verification.md +@patterns/tdd.md @patterns/agent-completion.md -Load and validate the story file. +**Load and parse story file** \`\`\`bash STORY_FILE="docs/sprint-artifacts/{{story_key}}.md" [ -f "$STORY_FILE" ] || { echo "ERROR: Story file not found"; exit 1; } \`\`\` -Use Read tool on the story file. Parse: -- Complexity level (micro/standard/complex) +Use Read tool. Extract: - Task count - Acceptance criteria count +- Keywords for risk scoring -Determine which agents to spawn based on complexity routing. +**Determine complexity:** +\`\`\`bash +TASK_COUNT=$(grep -c "^- \[ \]" "$STORY_FILE") +RISK_KEYWORDS=$(grep -ciE "auth|security|payment|encryption|migration|database" "$STORY_FILE") + +if [ "$TASK_COUNT" -le 3 ] && [ "$RISK_KEYWORDS" -eq 0 ]; then + COMPLEXITY="micro" + REVIEWER_COUNT=2 +elif [ "$TASK_COUNT" -ge 16 ] || [ "$RISK_KEYWORDS" -gt 0 ]; then + COMPLEXITY="complex" + REVIEWER_COUNT=4 +else + COMPLEXITY="standard" + REVIEWER_COUNT=3 +fi +\`\`\` + +Determine agents to spawn: Inspector + Test Quality + $REVIEWER_COUNT Reviewers + + + +**Phase 0: Playbook Query** + +\`\`\` +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +📚 PHASE 0: PLAYBOOK QUERY +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +\`\`\` + +**Extract story keywords:** +\`\`\`bash +STORY_KEYWORDS=$(grep -E "^## Story Title|^### Feature|^## Business Context" "$STORY_FILE" | sed 's/[#]//g' | tr '\n' ' ') +echo "Story keywords: $STORY_KEYWORDS" +\`\`\` + +**Search for relevant playbooks:** +Use Grep tool: +- Pattern: extracted keywords +- Path: \`docs/playbooks/implementation-playbooks/\` +- Output mode: files_with_matches +- Limit: 3 files + +**Load matching playbooks:** +For each playbook found: +- Use Read tool +- Extract sections: Common Gotchas, Code Patterns, Test Requirements + +If no playbooks exist: +\`\`\` +ℹ️ No playbooks found - this will be the first story to create them +\`\`\` + +Store playbook content for Builder. -**Phase 1: Builder Agent (Steps 1-4)** +**Phase 1: Builder Agent** \`\`\` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ @@ -76,41 +144,359 @@ Determine which agents to spawn based on complexity routing. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \`\`\` -Spawn Builder agent and save agent_id for later resume. - -**CRITICAL: Save Builder's agent_id for later resume** +Spawn Builder agent and **SAVE agent_id for resume later**: \`\`\` -BUILDER_AGENT_ID={{agent_id_from_task_result}} -echo "Builder agent: $BUILDER_AGENT_ID" +BUILDER_TASK = Task({ + subagent_type: "general-purpose", + description: "Implement story {{story_key}}", + prompt: \` +You are the BUILDER agent for story {{story_key}}. + + +@patterns/tdd.md +@patterns/agent-completion.md + + + +Story: [inline story file content] + +{{IF playbooks loaded}} +Relevant Playbooks (review before implementing): +[inline playbook content] + +Pay special attention to: +- Common Gotchas in these playbooks +- Code Patterns to follow +- Test Requirements to satisfy +{{ENDIF}} + + + +Implement the story requirements: +1. Review story tasks and acceptance criteria +2. **Review playbooks** for gotchas and patterns (if provided) +3. Analyze what exists vs needed (gap analysis) +4. **Write tests FIRST** (TDD - tests before implementation) +5. Implement production code to pass tests + + + +- DO NOT validate your own work +- DO NOT review your code +- DO NOT update story checkboxes +- DO NOT commit changes yet + + + +- [ ] Reviewed playbooks for guidance +- [ ] Tests written for all requirements +- [ ] Production code implements tests +- [ ] Tests pass +- [ ] Return structured completion artifact + + + +Return structured JSON artifact: +{ + "agent": "builder", + "story_key": "{{story_key}}", + "status": "SUCCESS" | "FAILED", + "files_created": ["path/to/file.tsx", ...], + "files_modified": ["path/to/file.tsx", ...], + "tests_added": { + "total": 12, + "passing": 12 + }, + "tasks_addressed": ["task description from story", ...] +} + +Save to: docs/sprint-artifacts/completions/{{story_key}}-builder.json + +\` +}) + +BUILDER_AGENT_ID = {{extract agent_id from Task result}} \`\`\` -Wait for completion. Parse structured output. Verify files exist. +**CRITICAL: Store Builder agent ID:** +\`\`\`bash +echo "Builder agent ID: $BUILDER_AGENT_ID" +echo "$BUILDER_AGENT_ID" > /tmp/builder-agent-id.txt +\`\`\` + +**Wait for completion. Verify artifact exists:** +\`\`\`bash +BUILDER_COMPLETION="docs/sprint-artifacts/completions/{{story_key}}-builder.json" +[ -f "$BUILDER_COMPLETION" ] || { echo "❌ No builder artifact"; exit 1; } +\`\`\` + +**Verify files exist:** +\`\`\`bash +# For each file in files_created and files_modified: +[ -f "$file" ] || echo "❌ MISSING: $file" +\`\`\` If files missing or status FAILED: halt pipeline. -**Phase 2: Parallel Verification (Inspector + Reviewers)** +**Phase 2: Parallel Verification (Inspector + Test Quality + Reviewers)** \`\`\` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 🔍 PHASE 2: PARALLEL VERIFICATION ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +Spawning: Inspector + Test Quality + {{REVIEWER_COUNT}} Reviewers +Total agents: {{2 + REVIEWER_COUNT}} +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \`\`\` -**CRITICAL: Spawn ALL verification agents in ONE message (parallel execution)** +**CRITICAL: Spawn ALL agents in ONE message (parallel execution)** + +Send single message with multiple Task calls: +1. Inspector Agent +2. Test Quality Agent +3. Security Reviewer +4. Logic/Performance Reviewer (if standard/complex) +5. Architect/Integration Reviewer +6. Code Quality Reviewer (if complex) + +--- + +## Inspector Agent Prompt: -Determine reviewer count based on complexity: \`\`\` -if complexity == "micro": REVIEWER_COUNT = 1 -if complexity == "standard": REVIEWER_COUNT = 2 -if complexity == "complex": REVIEWER_COUNT = 3 +Task({ + subagent_type: "general-purpose", + description: "Validate story {{story_key}} implementation", + prompt: \` +You are the INSPECTOR agent for story {{story_key}}. + + +@patterns/verification.md +@patterns/agent-completion.md + + + +Story: [inline story file content] + + + +Independently verify implementation WITH CODE CITATIONS: + +1. Read story file - understand ALL tasks +2. Read each file Builder created/modified +3. **Map EACH task to specific code with file:line citations** +4. Run verification checks: + - Type-check (0 errors required) + - Lint (0 warnings required) + - Tests (all passing required) + - Build (success required) + + + +**EVERY task must have evidence.** + +For each task, provide: +- file:line where it's implemented +- Brief quote of relevant code +- Verdict: IMPLEMENTED or NOT_IMPLEMENTED + +Example: +Task: "Display occupant agreement status" +Evidence: src/features/agreement/StatusBadge.tsx:45-67 +Code: "const StatusBadge = ({ status }) => ..." +Verdict: IMPLEMENTED + + + +- You have NO KNOWLEDGE of what Builder did +- Run all checks yourself - don't trust claims +- **Every task needs file:line citation** +- If code doesn't exist: mark NOT IMPLEMENTED with reason + + + +- [ ] ALL tasks mapped to code locations +- [ ] Type check: 0 errors +- [ ] Lint: 0 warnings +- [ ] Tests: all passing +- [ ] Build: success +- [ ] Return structured evidence + + + +{ + "agent": "inspector", + "story_key": "{{story_key}}", + "verdict": "PASS" | "FAIL", + "task_verification": [ + { + "task": "Create agreement view component", + "implemented": true, + "evidence": [ + { + "file": "src/features/agreement/AgreementView.tsx", + "lines": "15-67", + "code_snippet": "export const AgreementView = ({ agreementId }) => {...}" + }, + { + "file": "src/features/agreement/AgreementView.test.tsx", + "lines": "8-45", + "code_snippet": "describe('AgreementView', () => {...})" + } + ] + }, + { + "task": "Add status badge", + "implemented": false, + "evidence": [], + "reason": "No StatusBadge component found in src/features/agreement/" + } + ], + "checks": { + "type_check": {"passed": true, "errors": 0}, + "lint": {"passed": true, "warnings": 0}, + "tests": {"passed": true, "total": 12, "passing": 12}, + "build": {"passed": true} + } +} + +Save to: docs/sprint-artifacts/completions/{{story_key}}-inspector.json + +\` +}) \`\`\` -Spawn Inspector + N Reviewers in single message. Wait for ALL agents to complete. Collect findings. +--- -Aggregate all findings from Inspector + Reviewers. +## Test Quality Agent Prompt: + +\`\`\` +Task({ + subagent_type: "general-purpose", + description: "Review test quality for {{story_key}}", + prompt: \` +You are the TEST QUALITY agent for story {{story_key}}. + + +Story: [inline story file content] +Builder completion: [inline builder artifact] + + + +Review test files for quality and completeness: + +1. Find all test files created/modified by Builder +2. For each test file, verify: + - **Happy path**: Primary functionality tested ✓ + - **Edge cases**: null, empty, invalid inputs ✓ + - **Error conditions**: Failures handled properly ✓ + - **Assertions**: Meaningful checks (not just "doesn't crash") + - **Test names**: Descriptive and clear + - **Deterministic**: No random data, no timing dependencies +3. Check that tests actually validate the feature + +**Focus on:** What's missing? What edge cases weren't considered? + + + +- [ ] All test files reviewed +- [ ] Edge cases identified (covered or missing) +- [ ] Error conditions verified +- [ ] Assertions are meaningful +- [ ] Tests are deterministic +- [ ] Return quality assessment + + + +{ + "agent": "test_quality", + "story_key": "{{story_key}}", + "verdict": "PASS" | "NEEDS_IMPROVEMENT", + "test_files_reviewed": ["path/to/test.tsx", ...], + "issues": [ + { + "severity": "HIGH", + "file": "path/to/test.tsx:45", + "issue": "Missing edge case: empty input array", + "recommendation": "Add test: expect(fn([])).toThrow(...)" + }, + { + "severity": "MEDIUM", + "file": "path/to/test.tsx:67", + "issue": "Test uses Math.random() - could be flaky", + "recommendation": "Use fixed test data" + } + ], + "coverage_analysis": { + "edge_cases_covered": true | false, + "error_conditions_tested": true | false, + "meaningful_assertions": true | false, + "tests_are_deterministic": true | false + }, + "summary": { + "high_issues": 1, + "medium_issues": 2, + "low_issues": 0 + } +} + +Save to: docs/sprint-artifacts/completions/{{story_key}}-test-quality.json + +\` +}) +\`\`\` + +--- + +(Continue with Security, Logic, Architect, Quality reviewers as before...) + +**Wait for ALL agents to complete.** + +Collect completion artifacts: +- \`inspector.json\` +- \`test-quality.json\` +- \`reviewer-security.json\` +- \`reviewer-logic.json\` (if spawned) +- \`reviewer-architect.json\` +- \`reviewer-quality.json\` (if spawned) + +Parse all findings and aggregate by severity. + + + +**Phase 2.5: Coverage Gate (Automated)** + +\`\`\` +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +📊 PHASE 2.5: COVERAGE GATE +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +\`\`\` + +Run coverage check: +\`\`\`bash +# Run tests with coverage +npm test -- --coverage --silent 2>&1 | tee coverage-output.txt + +# Extract coverage percentage (adjust grep pattern for your test framework) +COVERAGE=$(grep -E "All files|Statements" coverage-output.txt | head -1 | grep -oE "[0-9]+\.[0-9]+|[0-9]+" | head -1 || echo "0") + +echo "Coverage: ${COVERAGE}%" +echo "Threshold: {{coverage_threshold}}%" + +# Compare coverage +if (( $(echo "$COVERAGE < {{coverage_threshold}}" | bc -l) )); then + echo "❌ Coverage ${COVERAGE}% below threshold {{coverage_threshold}}%" + echo "Builder must add more tests before proceeding" + exit 1 +fi + +echo "✅ Coverage gate passed: ${COVERAGE}%" +\`\`\` + +If coverage fails: add to issues list for Builder to fix. @@ -156,68 +542,274 @@ If PASS: Proceed to reconciliation. \`\`\` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ -🔧 PHASE 5: RECONCILIATION (Orchestrator) +📊 PHASE 5: RECONCILIATION (Orchestrator) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \`\`\` **YOU (orchestrator) do this directly. No agent spawn.** -1. Get what was built (git log, git diff) -2. Read story file -3. Check off completed tasks (Edit tool) -4. Fill Dev Agent Record with pipeline details -5. Verify updates (grep task checkboxes) -6. Update sprint-status.yaml to "done" +**5.1: Load completion artifacts** +\`\`\`bash +BUILDER_FIXES="docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json" +INSPECTOR="docs/sprint-artifacts/completions/{{story_key}}-inspector.json" +\`\`\` + +Use Read tool on all artifacts. + +**5.2: Read story file** +Use Read tool: \`docs/sprint-artifacts/{{story_key}}.md\` + +**5.3: Check off completed tasks using Inspector evidence** + +For each task in \`inspector.task_verification\`: +- If \`implemented: true\` and has evidence: + - Use Edit tool: \`"- [ ] {{task}}"\` → \`"- [x] {{task}}"\` + +**5.4: Fill Dev Agent Record with evidence** + +Use Edit tool: +\`\`\`markdown +### Dev Agent Record +**Implementation Date:** {{timestamp}} +**Agent Model:** Claude Sonnet 4.5 (multi-agent pipeline v4.0) +**Git Commit:** {{git_commit}} + +**Pipeline Phases:** +- Phase 0: Playbook Query ({{playbooks_loaded}} loaded) +- Phase 1: Builder (initial implementation) +- Phase 2: Parallel Verification + - Inspector: {{verdict}} with code citations + - Test Quality: {{verdict}} + - {{REVIEWER_COUNT}} Reviewers: {{issues_found}} +- Phase 2.5: Coverage Gate ({{coverage}}%) +- Phase 3: Builder (resumed, fixed {{fixes_count}} issues) +- Phase 4: Inspector re-check ({{verdict}}) + +**Files Created:** {{count}} +**Files Modified:** {{count}} +**Tests:** {{tests.passing}}/{{tests.total}} passing ({{coverage}}%) +**Issues Fixed:** {{critical}} CRITICAL, {{high}} HIGH, {{medium}} MEDIUM + +**Task Evidence:** (Inspector code citations) +{{for each task with evidence}} +- [x] {{task}} + - {{evidence[0].file}}:{{evidence[0].lines}} +{{endfor}} +\`\`\` + +**5.5: Verify updates** +\`\`\`bash +CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md) +[ "$CHECKED" -gt 0 ] || { echo "❌ Zero tasks checked"; exit 1; } +echo "✅ Reconciled: $CHECKED tasks with evidence" +\`\`\` **Final Quality Gate** -Verify: -1. Git commit exists -2. Story tasks checked (count > 0) -3. Dev Agent Record filled -4. Sprint status updated +\`\`\`bash +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +echo "🔍 FINAL VERIFICATION" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" -If verification fails: fix using Edit, then re-verify. +# 1. Git commit exists +git log --oneline -3 | grep "{{story_key}}" || { echo "❌ No commit"; exit 1; } +echo "✅ Git commit found" + +# 2. Story tasks checked with evidence +CHECKED=$(grep -c "^- \[x\]" docs/sprint-artifacts/{{story_key}}.md) +[ "$CHECKED" -gt 0 ] || { echo "❌ No tasks checked"; exit 1; } +echo "✅ $CHECKED tasks checked with code citations" + +# 3. Dev Agent Record filled +grep -A 5 "### Dev Agent Record" docs/sprint-artifacts/{{story_key}}.md | grep -q "202" || { echo "❌ Record not filled"; exit 1; } +echo "✅ Dev Agent Record filled" + +# 4. Coverage met threshold +FINAL_COVERAGE=$(jq -r '.tests.coverage' docs/sprint-artifacts/completions/{{story_key}}-builder-fixes.json) +if (( $(echo "$FINAL_COVERAGE < {{coverage_threshold}}" | bc -l) )); then + echo "❌ Coverage ${FINAL_COVERAGE}% still below threshold" + exit 1 +fi +echo "✅ Coverage: ${FINAL_COVERAGE}%" + +echo "" +echo "✅ STORY COMPLETE" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +\`\`\` + +**Update sprint-status.yaml:** +Use Edit tool: \`"{{story_key}}: ready-for-dev"\` → \`"{{story_key}}: done"\` + + + +**Phase 6: Playbook Reflection** + +\`\`\` +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +💡 PHASE 6: PLAYBOOK REFLECTION +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +\`\`\` + +Spawn Reflection Agent: + +\`\`\` +Task({ + subagent_type: "general-purpose", + description: "Extract learnings from {{story_key}}", + prompt: \` +You are the REFLECTION agent for story {{story_key}}. + + +Story: [inline story file] +Builder initial: [inline builder.json] +All review findings: [inline all reviewer artifacts] +Builder fixes: [inline builder-fixes.json] +Test quality issues: [inline test-quality.json] + + + +Identify what future agents should know: + +1. **What issues were found?** (from reviewers) +2. **What did Builder miss initially?** (gaps, edge cases, security) +3. **What playbook knowledge would have prevented these?** +4. **Which module/feature area does this apply to?** +5. **Should we update existing playbook or create new?** + +Questions: +- What gotchas should future builders know? +- What code patterns should be standard? +- What test requirements are essential? +- What similar stories exist? + + + +- [ ] Analyzed review findings +- [ ] Identified preventable issues +- [ ] Determined which playbook(s) to update +- [ ] Return structured proposal + + + +{ + "agent": "reflection", + "story_key": "{{story_key}}", + "learnings": [ + { + "issue": "SQL injection in query builder", + "root_cause": "Builder used string concatenation (didn't know pattern)", + "prevention": "Playbook should document: always use parameterized queries", + "applies_to": "database queries, API endpoints with user input" + }, + { + "issue": "Missing edge case tests for empty arrays", + "root_cause": "Test Quality Agent found gap", + "prevention": "Playbook should require: test null/empty/invalid for all inputs", + "applies_to": "all data processing functions" + } + ], + "playbook_proposal": { + "action": "update_existing" | "create_new", + "playbook": "docs/playbooks/implementation-playbooks/database-api-patterns.md", + "module": "api/database", + "updates": { + "common_gotchas": [ + "Never concatenate user input into SQL - use parameterized queries", + "Test edge cases: null, undefined, [], '', invalid input" + ], + "code_patterns": [ + "db.query(sql, [param1, param2]) ✓", + "sql + userInput ✗" + ], + "test_requirements": [ + "Test SQL injection attempts: expect(query(\"' OR 1=1--\")).toThrow()", + "Test empty inputs: expect(fn([])).toHandle() or .toThrow()" + ], + "related_stories": ["{{story_key}}"] + } + } +} + +Save to: docs/sprint-artifacts/completions/{{story_key}}-reflection.json + +\` +}) +\`\`\` + +**Wait for completion.** + +**Review playbook proposal:** +\`\`\`bash +REFLECTION="docs/sprint-artifacts/completions/{{story_key}}-reflection.json" +ACTION=$(jq -r '.playbook_proposal.action' "$REFLECTION") +PLAYBOOK=$(jq -r '.playbook_proposal.playbook' "$REFLECTION") + +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +echo "📝 Playbook Update Proposal" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +echo "Action: $ACTION" +echo "Playbook: $PLAYBOOK" +echo "" +jq -r '.learnings[] | "- \(.issue)\n Prevention: \(.prevention)"' "$REFLECTION" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +\`\`\` + +If \`auto_apply_updates: true\` in config: +- Read playbook (or create from template if new) +- Use Edit tool to add learnings to sections +- Commit playbook update + +If \`auto_apply_updates: false\` (default): +- Display proposal for manual review +- User can apply later with \`/update-playbooks {{story_key}}\` -**Builder fails:** Don't spawn verification. Report failure and halt. -**Inspector fails (Phase 2):** Still run Reviewers in parallel, collect all findings together. -**Inspector fails (Phase 4):** Resume Builder again with new issues (iterative fix loop). -**Builder resume fails:** Report unfixed issues. Manual intervention needed. -**Reconciliation fails:** Fix using Edit tool. Re-verify checkboxes. +**Builder fails (Phase 1):** Don't spawn verification. Report failure and halt. +**Inspector fails (Phase 2):** Still collect other reviewer findings. +**Test Quality fails:** Add issues to Builder fix list. +**Coverage below threshold:** Add to Builder fix list. +**Reviewers find CRITICAL:** Builder MUST fix when resumed. +**Inspector fails (Phase 4):** Resume Builder again (iterative loop, max 3 iterations). +**Builder resume fails:** Report unfixed issues. Manual intervention. +**Reconciliation fails:** Fix with Edit tool, re-verify. -| Complexity | Pipeline | Reviewers | Total Phase 2 Agents | -|------------|----------|-----------|---------------------| -| micro | Builder → [Inspector + 2 Reviewers] → Resume Builder → Inspector recheck | 2 (security, architect) | 3 agents | -| standard | Builder → [Inspector + 3 Reviewers] → Resume Builder → Inspector recheck | 3 (security, logic, architect) | 4 agents | -| complex | Builder → [Inspector + 4 Reviewers] → Resume Builder → Inspector recheck | 4 (security, logic, architect, quality) | 5 agents | +| Complexity | Phase 2 Agents | Total | Security | +|------------|----------------|-------|----------| +| micro | Inspector + Test Quality + 2 Reviewers | 4 agents | Security Reviewer + Architect | +| standard | Inspector + Test Quality + 3 Reviewers | 5 agents | Security + Logic + Architect | +| complex | Inspector + Test Quality + 4 Reviewers | 6 agents | Security + Logic + Architect + Quality | -**Key Improvements (v3.2.0):** -- All verification agents spawn in parallel (single message, faster execution) -- Builder resume in Phase 3 saves 50-70% tokens vs spawning fresh Fixer -- **NEW:** Architect/Integration Reviewer catches runtime issues (404s, pattern violations, missing migrations) - -**Reviewer Specializations:** -- **Security:** Auth, injection, secrets, cross-tenant access -- **Logic/Performance:** Bugs, edge cases, N+1 queries, race conditions -- **Architect/Integration:** Routes work, patterns match, migrations applied, dependencies installed (v3.2.0+) -- **Code Quality:** Maintainability, naming, duplication (complex only) +**All verification agents spawn in parallel (single message)** -- [ ] Builder spawned and agent_id saved -- [ ] All verification agents completed in parallel -- [ ] Builder resumed with consolidated findings -- [ ] Inspector recheck passed -- [ ] Git commit exists for story -- [ ] Story file has checked tasks (count > 0) -- [ ] Dev Agent Record filled with all phases -- [ ] Sprint status updated to "done" +- [ ] Phase 0: Playbooks loaded (if available) +- [ ] Phase 1: Builder spawned, agent_id saved +- [ ] Phase 2: All verification agents completed in parallel +- [ ] Phase 2.5: Coverage gate passed +- [ ] Phase 3: Builder resumed with consolidated findings +- [ ] Phase 4: Inspector recheck passed +- [ ] Phase 5: Orchestrator reconciled with Inspector evidence +- [ ] Phase 6: Playbook reflection completed +- [ ] Git commit exists +- [ ] Story tasks checked with code citations +- [ ] Dev Agent Record filled +- [ ] Coverage ≥ {{coverage_threshold}}% +- [ ] Sprint status: done + + +1. ✅ Resume Builder for fixes (v3.2+) - 50-70% token savings +2. ✅ Inspector provides code citations (v4.0) - file:line evidence for every task +3. ✅ Removed "hospital-grade" framing (v4.0) - kept disciplined gates +4. ✅ Micro stories get 2 reviewers + security scan (v3.2+) - not zero +5. ✅ Test Quality Agent (v4.0) + Coverage Gate (v4.0) - validates test quality and enforces threshold +6. ✅ Playbook query (v4.0) before Builder + reflection (v4.0) after - continuous learning + diff --git a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml index 47b73c77..3a075a5e 100644 --- a/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml +++ b/src/bmm/workflows/4-implementation/story-full-pipeline/workflow.yaml @@ -1,7 +1,7 @@ name: story-full-pipeline -description: "Multi-agent pipeline with wave-based execution, independent validation, and adversarial code review (GSDMAD)" -author: "BMAD Method + GSD" -version: "3.2.0" # Added architect-integration-reviewer for runtime verification +description: "Enhanced multi-agent pipeline with playbook learning, code citation evidence, test quality validation, and resume-builder fixes" +author: "BMAD Method" +version: "4.0.0" # Added playbook learning, test quality, coverage gates, Inspector code citations # Execution mode execution_mode: "multi_agent" # multi_agent | single_agent (fallback) @@ -37,13 +37,23 @@ agents: timeout: 3600 # 1 hour inspector: - description: "Validation agent - independent verification" + description: "Validation agent - independent verification with code citations" steps: [5, 6] subagent_type: "general-purpose" prompt_file: "{agents_path}/inspector.md" fresh_context: true # No knowledge of builder agent trust_level: "medium" # No conflict of interest timeout: 1800 # 30 minutes + require_code_citations: true # v4.0: Must provide file:line evidence for all tasks + + test_quality: + description: "Test quality validation - verifies test coverage and quality" + steps: [5.5] + subagent_type: "general-purpose" + prompt_file: "{agents_path}/test-quality.md" + fresh_context: true + trust_level: "medium" + timeout: 1200 # 20 minutes reviewer: description: "Adversarial code review - finds problems" @@ -73,15 +83,40 @@ agents: trust_level: "medium" # Incentive to minimize work timeout: 2400 # 40 minutes + reflection: + description: "Playbook learning - extracts patterns for future agents" + steps: [10] + subagent_type: "general-purpose" + prompt_file: "{agents_path}/reflection.md" + timeout: 900 # 15 minutes + # Reconciliation: orchestrator does this directly (see workflow.md Phase 5) +# Playbook configuration (v4.0) +playbooks: + enabled: true # Set to false in project config to disable + directory: "docs/playbooks/implementation-playbooks" + bootstrap_mode: true # Auto-initialize if missing + max_load: 3 + auto_apply_updates: false # Require manual review of playbook updates + discovery: + enabled: true # Scan git/docs to populate initial playbooks + sources: ["git_history", "docs", "existing_code"] + +# Quality gates (v4.0) +quality_gates: + coverage_threshold: 80 # % line coverage required + task_verification: "all_with_evidence" # Inspector must provide file:line citations + critical_issues: "must_fix" + high_issues: "must_fix" + # Complexity level (determines which steps to execute) complexity_level: "standard" # micro | standard | complex # Complexity routing complexity_routing: micro: - skip_agents: ["reviewer"] # Skip code review for micro stories + skip_agents: [] # Full pipeline (v4.0: micro gets security scan) description: "Lightweight path for low-risk stories" examples: ["UI tweaks", "text changes", "simple CRUD"] diff --git a/src/bmm/workflows/templates/implementation-playbook-template.md b/src/bmm/workflows/templates/implementation-playbook-template.md new file mode 100644 index 00000000..79208ebf --- /dev/null +++ b/src/bmm/workflows/templates/implementation-playbook-template.md @@ -0,0 +1,85 @@ +# {{Module/Feature Area}} - Implementation Playbook + +> **Purpose:** Guide future agents implementing features in {{module_name}} +> **Created:** {{date}} +> **Last Updated:** {{date}} + +## Common Gotchas + +**What mistakes to avoid:** + +- Add specific gotchas here as they're discovered +- Example: "Never concatenate user input into SQL queries" +- Example: "Always validate file paths before operations" + +## Code Patterns + +**Standard approaches that work:** + +### Pattern: {{Pattern Name}} + +✓ **Good:** +``` +// Example of correct pattern +db.query(sql, [param1, param2]) +``` + +✗ **Bad:** +``` +// Example of incorrect pattern +sql + userInput +``` + +### Pattern: {{Another Pattern}} + +✓ **Good:** +``` +// Another example +if (!data) return null; +``` + +✗ **Bad:** +``` +// Don't do this +data.map(...) // crashes if data is null +``` + +## Test Requirements + +**Essential tests for this module:** + +- **Happy path:** Verify primary functionality +- **Edge cases:** Test null, undefined, empty arrays, invalid inputs +- **Error conditions:** Verify errors are handled properly +- **Security:** Test for injection attacks, auth bypasses, etc. + +### Example Test Pattern + +```typescript +describe('FeatureName', () => { + it('handles happy path', () => { + expect(fn(validInput)).toEqual(expected) + }) + + it('handles edge cases', () => { + expect(fn(null)).toThrow() + expect(fn([])).toEqual([]) + }) + + it('validates security', () => { + expect(fn("' OR 1=1--")).toThrow() + }) +}) +``` + +## Related Stories + +Stories that used these patterns: + +- {{story_key}} - {{brief description}} + +## Notes + +- Keep this simple and actionable +- Add new learnings as they emerge +- Focus on preventable mistakes