docs(story-pipeline): add comprehensive documentation

Covers: - Problem statement and token efficiency gains - What each step automates (8-step workflow) - Usage: interactive, batch, and resume modes - Configuration options (workflow.yaml) - State management and checkpointing - Quality gates and adversarial mode - Troubleshooting and best practices - Comparison with legacy pipeline
2025-12-26 10:12:37 +01:00 · 2025-12-26 10:12:37 +01:00 · 1347daa279
parent d2d6328be2
commit 1347daa279
1 changed files with 491 additions and 0 deletions
--- a/src/modules/bmm/workflows/4-implementation/story-pipeline/README.md
+++ b/src/modules/bmm/workflows/4-implementation/story-pipeline/README.md
@ -0,0 +1,491 @@
 # Story Pipeline v2.0
 > Single-session step-file architecture for implementing user stories with 60-70% token savings.
 ## Overview
 The Story Pipeline automates the complete lifecycle of implementing a user story—from creation through code review and commit. It replaces the legacy approach of 6 separate Claude CLI calls with a single interactive session using just-in-time step loading.
 ### The Problem It Solves
 **Legacy Pipeline (v1.0):**
 ```
 bmad build 1-4
  └─> claude -p "Stage 1: Create story..."     # ~12K tokens
  └─> claude -p "Stage 2: Validate story..."   # ~12K tokens
  └─> claude -p "Stage 3: ATDD tests..."       # ~12K tokens
  └─> claude -p "Stage 4: Implement..."        # ~12K tokens
  └─> claude -p "Stage 5: Code review..."      # ~12K tokens
  └─> claude -p "Stage 6: Complete..."         # ~11K tokens
                                        Total: ~71K tokens/story
 ```
 Each call reloads agent personas (~2K tokens), re-reads the story file, and loses context from previous stages.
 **Story Pipeline v2.0:**
 ```
 bmad build 1-4
  └─> Single Claude session
        ├─> Load step-01-init.md (~200 lines)
        ├─> Role switch: SM
        ├─> Load step-02-create-story.md
        ├─> Load step-03-validate-story.md
        ├─> Role switch: TEA
        ├─> Load step-04-atdd.md
        ├─> Role switch: DEV
        ├─> Load step-05-implement.md
        ├─> Load step-06-code-review.md
        ├─> Role switch: SM
        ├─> Load step-07-complete.md
        └─> Load step-08-summary.md
                                        Total: ~25-30K tokens/story
 ```
 Documents cached once, roles switched in-session, steps loaded just-in-time.
 ## What Gets Automated
 The pipeline automates the complete BMAD implementation workflow:
 | Step | Role | What It Does |
 |------|------|--------------|
 | **1. Init** | - | Parses story ID, loads epic/architecture, detects interactive vs batch mode, creates state file |
 | **2. Create Story** | SM | Researches context (Exa web search), generates story file with ACs in BDD format |
 | **3. Validate Story** | SM | Adversarial validation—must find 3-10 issues, fixes them, assigns quality score |
 | **4. ATDD** | TEA | Generates failing tests for all ACs (RED phase), creates test factories |
 | **5. Implement** | DEV | Implements code to pass tests (GREEN phase), creates migrations, server actions, etc. |
 | **6. Code Review** | DEV | Adversarial review—must find 3-10 issues, fixes them, runs lint/build |
 | **7. Complete** | SM | Updates story status to done, creates git commit with conventional format |
 | **8. Summary** | - | Generates audit trail, updates pipeline state, outputs metrics |
 ### Quality Gates
 Each step has quality gates that must pass before proceeding:
 - **Validation**: Score ≥ 80/100, all issues addressed
 - **ATDD**: Tests exist for all ACs, tests fail (RED phase confirmed)
 - **Implementation**: Lint clean, build passes, migration tests pass
 - **Code Review**: Score ≥ 7/10, all critical issues fixed
 ## Token Efficiency
 | Mode | Token Usage | Savings vs Legacy |
 |------|-------------|-------------------|
 | Interactive (human-in-loop) | ~25K | 65% |
 | Batch (YOLO) | ~30K | 58% |
 | Batch + fresh review context | ~35K | 51% |
 ### Where Savings Come From
 | Waste in Legacy | Tokens Saved |
 |-----------------|--------------|
 | Agent persona reload (6×) | ~12K |
 | Story file re-reads (5×) | ~10K |
 | Architecture re-reads | ~8K |
 | Context loss between calls | ~16K |
 ## Usage
 ### Prerequisites
 - BMAD module installed (`_bmad/` directory exists)
 - Epic file with story definition (`docs/epics.md`)
 - Architecture document (`docs/architecture.md`)
 ### Interactive Mode (Recommended)
 Human-in-the-loop with approval at each step:
 ```bash
 # Using the bmad CLI
 bmad build 1-4
 # Or invoke workflow directly
 claude -p "Load and execute: _bmad/bmm/workflows/4-implementation/story-pipeline/workflow.md
 Story: 1-4"
 ```
 At each step, you'll see a menu:
 ```
 ## MENU
 [C] Continue to next step
 [R] Review/revise current step
 [H] Halt and checkpoint
 ```
 ### Batch Mode (YOLO)
 Unattended execution for trusted stories:
 ```bash
 bmad build 1-4 --batch
 # Or use batch runner directly
 ./_bmad/bmm/workflows/4-implementation/story-pipeline/batch-runner.sh 1-4
 ```
 Batch mode:
 - Skips all approval prompts
 - Fails fast on errors
 - Creates checkpoint on failure for resume
 ### Resume from Checkpoint
 If execution stops (context exhaustion, error, manual halt):
 ```bash
 bmad build 1-4 --resume
 # The pipeline reads state from:
 # docs/sprint-artifacts/pipeline-state-{story-id}.yaml
 ```
 Resume automatically:
 - Skips completed steps
 - Restores cached context
 - Continues from `lastStep + 1`
 ## Directory Structure
 ```
 story-pipeline/
 ├── workflow.yaml          # Configuration, agent mapping, quality gates
 ├── workflow.md            # Interactive mode orchestration
 ├── batch-runner.sh        # Batch mode runner script
 ├── steps/
 │   ├── step-01-init.md        # Initialize, load context
 │   ├── step-01b-resume.md     # Resume from checkpoint
 │   ├── step-02-create-story.md
 │   ├── step-03-validate-story.md
 │   ├── step-04-atdd.md
 │   ├── step-05-implement.md
 │   ├── step-06-code-review.md
 │   ├── step-07-complete.md
 │   └── step-08-summary.md
 ├── checklists/
 │   ├── story-creation.md      # What makes a good story
 │   ├── story-validation.md    # Validation criteria
 │   ├── atdd.md                # Test generation rules
 │   ├── implementation.md      # Coding standards
 │   └── code-review.md         # Review criteria
 └── templates/
    ├── pipeline-state.yaml    # State file template
    └── audit-trail.yaml       # Audit log template
 ```
 ## Configuration
 ### workflow.yaml
 ```yaml
 name: story-pipeline
 version: "2.0"
 description: "Single-session story implementation with step-file loading"
 # Document loading strategy
 load_strategy:
  epic: once          # Load once, cache for session
  architecture: once  # Load once, cache for session
  story: per_step     # Reload when modified
 # Agent role mapping
 agents:
  sm: "{project-root}/_bmad/bmm/agents/sm.md"
  tea: "{project-root}/_bmad/bmm/agents/tea.md"
  dev: "{project-root}/_bmad/bmm/agents/dev.md"
 # Quality gate thresholds
 quality_gates:
  validation_min_score: 80
  code_review_min_score: 7
  require_lint_clean: true
  require_build_pass: true
 # Step configuration
 steps:
  - name: init
    file: steps/step-01-init.md
  - name: create-story
    file: steps/step-02-create-story.md
    agent: sm
  # ... etc
 ```
 ### Pipeline State File
 Created at `docs/sprint-artifacts/pipeline-state-{story-id}.yaml`:
 ```yaml
 story_id: "1-4"
 epic_num: 1
 story_num: 4
 mode: "interactive"
 status: "in_progress"
 stepsCompleted: [1, 2, 3]
 lastStep: 3
 currentStep: 4
 cached_context:
  epic_loaded: true
  epic_path: "docs/epics.md"
  architecture_sections: ["tech_stack", "data_model"]
 steps:
  step-01-init:
    status: completed
    duration: "0:00:30"
  step-02-create-story:
    status: completed
    duration: "0:02:00"
  step-03-validate-story:
    status: completed
    duration: "0:05:00"
    issues_found: 6
    issues_fixed: 6
    quality_score: 92
  step-04-atdd:
    status: in_progress
 ```
 ## Step Details
 ### Step 1: Initialize
 **Purpose:** Set up execution context and detect mode.
 **Actions:**
 1. Parse story ID (e.g., "1-4" → epic 1, story 4)
 2. Load and cache epic document
 3. Load relevant architecture sections
 4. Check for existing state file (resume vs fresh)
 5. Detect mode (interactive/batch) from CLI flags
 6. Create initial state file
 **Output:** `pipeline-state-{story-id}.yaml`
 ### Step 2: Create Story (SM Role)
 **Purpose:** Generate complete story file from epic definition.
 **Actions:**
 1. Switch to Scrum Master (SM) role
 2. Read story definition from epic
 3. Research context via Exa web search (best practices, patterns)
 4. Generate story file with:
   - User story format (As a... I want... So that...)
   - Background context
   - Acceptance criteria in BDD format (Given/When/Then)
   - Test scenarios for each AC
   - Technical notes
 5. Save to `docs/sprint-artifacts/story-{id}.md`
 **Quality Gate:** Story file exists with all required sections.
 ### Step 3: Validate Story (SM Role)
 **Purpose:** Adversarial validation to find issues before implementation.
 **Actions:**
 1. Load story-validation checklist
 2. Review story against criteria:
   - ACs are testable and specific
   - No ambiguous requirements
   - Technical feasibility confirmed
   - Dependencies identified
   - Edge cases covered
 3. **Must find 3-10 issues** (never "looks good")
 4. Fix all identified issues
 5. Assign quality score (0-100)
 6. Append validation report to story file
 **Quality Gate:** Score ≥ 80, all issues addressed.
 ### Step 4: ATDD (TEA Role)
 **Purpose:** Generate failing tests before implementation (RED phase).
 **Actions:**
 1. Switch to Test Engineering Architect (TEA) role
 2. Load atdd checklist
 3. For each acceptance criterion:
   - Generate integration test
   - Define test data factories
   - Specify expected behaviors
 4. Create test files in `src/tests/`
 5. Update `factories.ts` with new fixtures
 6. **Verify tests FAIL** (RED phase)
 7. Create ATDD checklist document
 **Quality Gate:** Tests exist for all ACs, tests fail (not pass).
 ### Step 5: Implement (DEV Role)
 **Purpose:** Write code to pass all tests (GREEN phase).
 **Actions:**
 1. Switch to Developer (DEV) role
 2. Load implementation checklist
 3. Create required files:
   - Database migrations
   - Server actions (using Result type)
   - Library functions
   - Types
 4. Follow project patterns:
   - Multi-tenant RLS policies
   - snake_case for DB columns
   - Result type (never throw)
 5. Run lint and fix issues
 6. Run build and fix issues
 7. Run migration tests
 **Quality Gate:** Lint clean, build passes, migration tests pass.
 ### Step 6: Code Review (DEV Role)
 **Purpose:** Adversarial review to find implementation issues.
 **Actions:**
 1. Load code-review checklist
 2. Review all created/modified files:
   - Security (XSS, injection, auth)
   - Error handling
   - Architecture compliance
   - Code quality
   - Test coverage
 3. **Must find 3-10 issues** (never "looks good")
 4. Fix all identified issues
 5. Re-run lint and build
 6. Assign quality score (0-10)
 7. Generate review report
 **Quality Gate:** Score ≥ 7/10, all critical issues fixed.
 ### Step 7: Complete (SM Role)
 **Purpose:** Finalize story and create git commit.
 **Actions:**
 1. Switch back to SM role
 2. Update story file status to "done"
 3. Stage all story files
 4. Create conventional commit:
   ```
   feat(epic-{n}): complete story {id}
   {Summary of changes}
   🤖 Generated with Claude Code
   Co-Authored-By: Claude <noreply@anthropic.com>
   ```
 5. Update pipeline state
 **Quality Gate:** Commit created successfully.
 ### Step 8: Summary
 **Purpose:** Generate audit trail and final metrics.
 **Actions:**
 1. Calculate total duration
 2. Compile deliverables list
 3. Aggregate quality scores
 4. Generate execution summary in state file
 5. Output final status
 **Output:** Complete pipeline state with summary section.
 ## Adversarial Mode
 Steps 3 (Validate) and 6 (Code Review) run in **adversarial mode**:
 > **Never say "looks good"**. You MUST find 3-10 real issues.
 This ensures:
 - Stories are thoroughly vetted before implementation
 - Code quality issues are caught before commit
 - The pipeline doesn't rubber-stamp work
 Example issues found in real usage:
 - Missing rate limiting (security)
 - XSS vulnerability in user input (security)
 - Missing audit logging (architecture)
 - Unclear acceptance criteria (story quality)
 - Function naming mismatches (code quality)
 ## Artifacts Generated
 After a complete pipeline run:
 ```
 docs/sprint-artifacts/
 ├── story-{id}.md              # Story file with ACs, validation report
 ├── pipeline-state-{id}.yaml   # Execution state and summary
 ├── atdd-checklist-{id}.md     # Test requirements checklist
 └── code-review-{id}.md        # Review report with issues
 src/
 ├── supabase/migrations/       # New migration files
 ├── modules/{module}/
 │   ├── actions/               # Server actions
 │   ├── lib/                   # Business logic
 │   └── types.ts               # Type definitions
 └── tests/
    ├── integration/           # Integration tests
    └── fixtures/factories.ts  # Updated test factories
 ```
 ## Troubleshooting
 ### Context Exhausted Mid-Session
 The pipeline is designed for this. When context runs out:
 1. Claude session ends
 2. State file preserves progress
 3. Run `bmad build {id} --resume`
 4. Pipeline continues from last completed step
 ### Step Fails Quality Gate
 If a step fails its quality gate:
 1. Pipeline halts at that step
 2. State file shows `status: failed`
 3. Fix issues manually or adjust thresholds
 4. Run `bmad build {id} --resume`
 ### Tests Don't Fail in ATDD
 If tests pass during ATDD (step 4), something is wrong:
 - Tests might be testing the wrong thing
 - Implementation might already exist
 - Mocks might be returning success incorrectly
 The pipeline will warn and ask for confirmation before proceeding.
 ## Best Practices
 1. **Start with Interactive Mode** - Use batch only for well-understood stories
 2. **Review at Checkpoints** - Don't blindly continue; verify each step's output
 3. **Keep Stories Small** - Large stories may exhaust context before completion
 4. **Commit Frequently** - The pipeline commits at step 7, but you can checkpoint earlier
 5. **Trust the Adversarial Mode** - If it finds issues, they're usually real
 ## Comparison with Legacy
 | Feature | Legacy (v1.0) | Story Pipeline (v2.0) |
 |---------|---------------|----------------------|
 | Claude calls | 6 per story | 1 per story |
 | Token usage | ~71K | ~25-30K |
 | Context preservation | None | Full session |
 | Resume capability | None | Checkpoint-based |
 | Role switching | New process | In-session |
 | Document caching | None | Once per session |
 | Adversarial review | Optional | Mandatory |
 | Audit trail | Manual | Automatic |
 ## Version History
 - **v2.0** (2024-12) - Step-file architecture, single-session, checkpoint/resume
 - **v1.0** (2024-11) - Legacy 6-call pipeline