Commit Graph

2 Commits

Author SHA1 Message Date
Jonah Schulte a268b4c1bc feat: upgrade story-full-pipeline to v4.0 with 6 major enhancements
Upgrade from v3.2.0 to v4.0.0 with improvements inspired by CooperBench research
(Stanford/SAP 2026) on agent coordination failures.

Enhancement 1: Resume Builder (v3.2+)
- Phase 3 RESUMES Builder agent with review findings
- Builder already has full codebase context (50-70% token savings)
- More efficient than spawning fresh Fixer agent

Enhancement 2: Inspector Code Citations (v4.0)
- Inspector must map EVERY task to file:line citations
- Example: "Create component" → "src/Component.tsx:45-67"
- No more "trust me, it works" - requires proof
- Returns structured JSON with code evidence per task
- Prevents vague communication (CooperBench finding)

Enhancement 3: Remove Hospital-Grade Framing (v4.0)
- Dropped psychological appeal language
- Kept rigorous verification gates and bash checks
- Focus on concrete, measurable verification
- Replaced with patterns/verification.md + patterns/tdd.md

Enhancement 4: Micro Stories Get Security Scan (v4.0)
- No longer skip ALL review for micro stories
- Micro now gets 2 reviewers: Security + Architect
- Lightweight but still catches critical vulnerabilities

Enhancement 5: Test Quality Agent + Coverage Gate (v4.0)
- New Test Quality Agent validates:
  - Edge cases covered (null, empty, invalid)
  - Error conditions tested
  - Meaningful assertions (not just "doesn't crash")
  - No flaky tests (random data, timing)
- Automated Coverage Gate enforces 80% threshold
- Builder must fix test gaps before proceeding

Enhancement 6: Playbook Learning System (v4.0)
- Phase 0: Query playbooks before implementation
- Builder gets relevant patterns/gotchas upfront
- Phase 6: Reflection agent extracts learnings
- Auto-generates playbook updates for future agents
- Bootstrap mode: auto-initializes playbooks if missing
- Continuous improvement through reflection

Pipeline: Phase 0 (Playbooks) → Phase 1 (Builder) → Phase 2 (Inspector +
Test Quality + Reviewers parallel) → Phase 2.5 (Coverage Gate) → Phase 3
(Resume Builder) → Phase 4 (Inspector recheck) → Phase 5 (Reconciliation) →
Phase 6 (Reflection)

Files Modified:
- workflow.yaml: v4.0 config with playbooks + quality_gates
- workflow.md: Complete v4.0 documentation with all phases
- agents/builder.md: Playbook awareness + structured JSON
- agents/inspector.md: Code citation requirements + evidence format
- agents/reviewer.md: Remove hospital-grade reference
- agents/architect-integration-reviewer.md: Remove hospital-grade reference
- agents/fixer.md: Remove hospital-grade reference
- README.md: v4.0 documentation + CooperBench analysis

Files Created:
- agents/test-quality.md: Test quality validation agent
- agents/reflection.md: Playbook learning agent
- ../templates/implementation-playbook-template.md: Simple playbook structure

Design Philosophy:
The workflow avoids CooperBench's "curse of coordination" by using:
- Sequential implementation (ONE writer, no merge conflicts)
- Parallel verification (safe read-only validation)
- Context reuse (no expectation failures)
- Evidence-based communication (file:line citations)
- Clear role separation (no overlapping responsibilities)
2026-01-28 13:28:37 -05:00
Jonah Schulte 9fbaca3384 feat(pipeline): add architect/integration reviewer for runtime verification
- Adds third reviewer to catch routing, pattern, and integration issues
- Verifies routes actually load (not just compile)
- Checks migrations applied, dependencies installed
- Compares new code against existing project patterns
- Framework-agnostic approach works on any project

Complexity routing updated:
- micro: 2 reviewers (security, architect)
- standard: 3 reviewers (security, logic, architect)
- complex: 4 reviewers (security, logic, architect, quality)

Version: 3.1.0 → 3.2.0
2026-01-28 09:36:05 -05:00