BMAD-METHOD/docs/improvements/uat-integration-architectur...

26 KiB

UAT Validation Integration Architecture

Overview

This document describes how UAT validation integrates with the epic-chain workflow to provide automated quality gates, self-healing fix loops, and comprehensive validation reporting.


Integration Points

┌──────────────────────────────────────────────────────────────────────────────────┐
│                    EPIC CHAIN WITH UAT VALIDATION + SELF-HEALING                  │
├──────────────────────────────────────────────────────────────────────────────────┤
│                                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────────────┐ │
│  │                            PER EPIC LOOP                                     │ │
│  │                                                                              │ │
│  │  ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐       │ │
│  │  │ Phase 1 │──►│ Phase 2 │──►│ Phase 3 │──►│ Phase 4 │──►│ Phase 5 │       │ │
│  │  │  Dev    │   │ Review  │   │ Commit  │   │  UAT    │   │  Gate   │       │ │
│  │  │         │   │         │   │         │   │  Gen    │   │ Check   │       │ │
│  │  └─────────┘   └─────────┘   └─────────┘   └─────────┘   └────┬────┘       │ │
│  │                                                                │            │ │
│  └────────────────────────────────────────────────────────────────┼────────────┘ │
│                                                                   │              │
│                                                     ┌─────────────┴───────┐      │
│                                                     │    GATE DECISION    │      │
│                                                     └──────────┬──────────┘      │
│                                                                │                 │
│                              ┌──────────────────┬──────────────┴──────────┐      │
│                              │                  │                         │      │
│                              ▼                  ▼                         ▼      │
│                           ┌──────┐          ┌──────┐               ┌──────────┐  │
│                           │ PASS │          │ FAIL │               │ MAX      │  │
│                           │      │          │      │               │ RETRIES  │  │
│                           └──┬───┘          └──┬───┘               └────┬─────┘  │
│                              │                 │                        │        │
│                              │                 ▼                        ▼        │
│                              │    ┌────────────────────────┐    ┌────────────┐   │
│                              │    │      SELF-HEALING      │    │   HALT +   │   │
│                              │    │                        │    │   NOTIFY   │   │
│                              │    │  ┌──────────────────┐  │    └────────────┘   │
│                              │    │  │  Quick Dev Fix   │  │                     │
│                              │    │  │  (Barry Agent)   │  │                     │
│                              │    │  │                  │  │                     │
│                              │    │  │ • Load failures  │  │                     │
│                              │    │  │ • Generate fix   │  │                     │
│                              │    │  │ • Commit changes │  │                     │
│                              │    │  └────────┬─────────┘  │                     │
│                              │    │           │            │                     │
│                              │    │           ▼            │                     │
│                              │    │  ┌──────────────────┐  │                     │
│                              │    │  │   Re-validate    │  │                     │
│                              │    │  │   UAT Gate       │──┼──► Back to GATE     │
│                              │    │  └──────────────────┘  │                     │
│                              │    │                        │                     │
│                              │    └────────────────────────┘                     │
│                              │                                                   │
│                              ▼                                                   │
│                        ┌──────────┐                                              │
│                        │  Next    │                                              │
│                        │  Epic    │                                              │
│                        └──────────┘                                              │
│                                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────────────┐ │
│  │                          CHAIN COMPLETION                                    │ │
│  │                                                                              │ │
│  │  ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────────┐       │ │
│  │  │  Aggregate      │──►│  Generate       │──►│  Final UAT         │       │ │
│  │  │  Metrics        │   │  Chain Report   │   │  Summary           │       │ │
│  │  └─────────────────┘   └─────────────────┘   └─────────────────────┘       │ │
│  │                                                                              │ │
│  └─────────────────────────────────────────────────────────────────────────────┘ │
│                                                                                   │
└──────────────────────────────────────────────────────────────────────────────────┘

Self-Healing Loop: UAT Failure → Quick Dev → Re-validate

When UAT validation fails, the system automatically triggers a quick-dev session to fix the identified issues.

Flow Detail

UAT Gate Check
     │
     ├── PASS ──────────────────────────────► Continue to Next Epic
     │
     └── FAIL
          │
          ▼
     ┌─────────────────────────────────────────────────────────────┐
     │                    FAILURE ANALYSIS                          │
     │                                                              │
     │  1. Collect failed scenarios with:                          │
     │     - Scenario ID and description                           │
     │     - Expected vs actual output                             │
     │     - Error messages / stack traces                         │
     │     - Related story acceptance criteria                     │
     │                                                              │
     │  2. Generate fix context document:                          │
     │     docs/sprint-artifacts/uat-fix-context-{epic}-{attempt}.md │
     └─────────────────────────────────────────────────────────────┘
          │
          ▼
     ┌─────────────────────────────────────────────────────────────┐
     │                    QUICK DEV SESSION                         │
     │                    (Barry - Quick Flow Solo Dev)             │
     │                                                              │
     │  Input: uat-fix-context-{epic}-{attempt}.md                 │
     │                                                              │
     │  Process:                                                    │
     │  1. Load fix context (failed scenarios + error details)     │
     │  2. Analyze root cause for each failure                     │
     │  3. Implement targeted fixes                                │
     │  4. Run self-check (step-04)                                │
     │  5. Commit with message: "fix(epic-{id}): UAT fix #{n}"     │
     │                                                              │
     │  Output:                                                     │
     │  - Code changes committed                                   │
     │  - Fix summary in story dev record                          │
     └─────────────────────────────────────────────────────────────┘
          │
          ▼
     ┌─────────────────────────────────────────────────────────────┐
     │                    RE-VALIDATE                               │
     │                                                              │
     │  Run UAT Gate Check again on same scenarios                 │
     │                                                              │
     │  Outcomes:                                                   │
     │  - PASS → Continue to next epic                             │
     │  - FAIL + attempts < max_retries → Loop back to Quick Dev   │
     │  - FAIL + attempts >= max_retries → HALT chain              │
     └─────────────────────────────────────────────────────────────┘

Configuration

# In epic-chain config
uat:
  gate_enabled: true
  gate_mode: quick              # quick | full | skip

  # Self-healing configuration
  self_heal:
    enabled: true
    max_retries: 2              # Maximum fix attempts per epic
    fix_workflow: quick-dev     # Workflow to use for fixes
    fix_agent: barry            # Agent to invoke

    # What to include in fix context
    include_in_context:
      - failed_scenarios
      - error_output
      - related_stories
      - acceptance_criteria
      - recent_commits         # Last 3 commits for context

    # Escalation
    on_max_retries: halt        # halt | continue_with_warning | notify_human
    notification_channel: null  # Optional: slack, email, etc.

Fix Context Document Template

Generated when UAT fails, consumed by Quick Dev:

# UAT Fix Context - Epic {epic_id} (Attempt {n})

## Failed Scenarios

### Scenario {id}: {name}

**Expected Result:**
{expected}

**Actual Result:**
{actual}

**Error Output:**

{stderr or error message}


**Related Story:** {story_id}
**Acceptance Criteria:**
- {criteria from story}

---

## Fix Instructions

Address the following failures in priority order:

1. **{scenario_id}**: {one-line description of what's broken}
   - Root cause hint: {if determinable}
   - Files likely involved: {if determinable}

## Constraints

- Only fix the identified failures
- Do not refactor unrelated code
- Run tests after each fix
- Commit with message format: `fix(epic-{id}): {description}`

UAT Scenario Classification

Based on the UAT sample document, scenarios fall into three categories:

Automatable (Execute via Shell)

Scenario Type Example Automation Method
CLI commands npx heimdall --version Shell execution, check exit code + output
Build verification npm run build Shell execution, parse output for success
API health checks curl /health HTTP request, validate JSON response
Database status npx heimdall db status Shell execution, parse structured output
Configuration validation npx heimdall config validate Shell execution, check for "valid" in output

Semi-Automated (Execute + Manual Verify)

Scenario Type Example Approach
Email delivery npx heimdall test-send Execute command, log message ID, flag for inbox check
File creation heimdall config init Execute, verify file exists, show contents for review
Worker processes heimdall start Start, verify startup message, terminate after timeout

Manual Only

Scenario Type Example Approach
External service setup Railway deployment Document steps, skip in automation
Visual verification UI appearance Generate screenshots if possible, flag for review
Multi-step human flows Full onboarding journey Provide checklist, require human sign-off

Gate Check Implementation

Quick Gate (Default)

Runs only automatable scenarios from the "Minimum Requirements" section:

# uat-gate-config.yaml
gate_mode: quick
timeout_per_scenario: 30  # seconds
fail_threshold: 0         # any failure = gate fail

scenarios_to_run:
  - type: "cli_command"
    match: "Expected Results" sections with CLI commands
  - type: "health_check"
    match: "/health endpoint" scenarios
  - type: "validation"
    match: "validate" command scenarios

skip_scenarios:
  - contains: "email inbox"
  - contains: "Railway"
  - contains: "browser"
  - contains: "terminal window"

Full Gate

Runs all automatable scenarios plus flags semi-automated for review:

gate_mode: full
include_semi_automated: true
generate_manual_checklist: true

Data Flow

Per-Epic Metrics Collection

# Written to: {sprint_artifacts}/metrics/epic-{id}-metrics.yaml

epic_id: "1"
epic_name: "Foundation, CLI & Deployment"

execution:
  start_time: "2026-01-02T13:40:00Z"
  end_time: "2026-01-02T15:10:00Z"
  duration_seconds: 5400

stories:
  total: 7
  completed: 7
  failed: 0
  skipped: 0

uat:
  document_generated: true
  document_path: "docs/uat/epic-1-uat.md"
  scenarios:
    total: 9
    automatable: 6
    semi_automated: 2
    manual_only: 1

validation:
  gate_executed: true
  gate_mode: "quick"
  results:
    passed: 6
    failed: 0
    skipped: 3
  gate_status: "PASS"
  blocking_issues: []

  # Self-healing loop tracking
  fix_attempts: 0
  fix_history: []
  # Example when fixes were needed:
  # fix_attempts: 2
  # fix_history:
  #   - attempt: 1
  #     failed_scenarios: ["scenario-3", "scenario-5"]
  #     fix_context: "docs/sprint-artifacts/uat-fix-context-1-1.md"
  #     fix_commit: "abc123"
  #     result: "partial"  # 1 of 2 fixed
  #   - attempt: 2
  #     failed_scenarios: ["scenario-5"]
  #     fix_context: "docs/sprint-artifacts/uat-fix-context-1-2.md"
  #     fix_commit: "def456"
  #     result: "success"  # all fixed

issues:
  - type: "signaling_mismatch"
    story: "1-3"
    severity: "low"
    resolved: true

Chain Report Aggregation

# Read from: {sprint_artifacts}/metrics/epic-*-metrics.yaml
# Write to: {sprint_artifacts}/chain-execution-report.md

chain:
  total_epics: 8
  total_stories: 58
  total_duration_seconds: 63000

  epics:
    - id: "1"
      stories: 7
      duration: 5400
      uat_gate: "PASS"
    # ... etc

  uat_summary:
    total_scenarios: 72
    automatable: 48
    auto_passed: 45
    auto_failed: 3
    manual_pending: 24

  gate_results:
    passed: 7
    failed: 1
    blocked_chain: false

Workflow File Changes

Modified: epic-chain/workflow.yaml

# Add to variables section:
variables:
  # ... existing ...

  # UAT Gate Configuration
  uat_gate_enabled: true
  uat_gate_mode: "quick"      # quick | full | skip
  uat_gate_blocking: false    # If true, halts chain on failure

  # Report Configuration
  generate_chain_report: true
  chain_report_file: "{sprint_artifacts}/chain-execution-report.md"
  metrics_folder: "{sprint_artifacts}/metrics"

New: step-05-uat-gate.md

# UAT Gate Check

## Purpose
Validate epic implementation against automatable UAT scenarios before proceeding.

## Inputs
- UAT document: `docs/uat/epic-{id}-uat.md`
- Gate config: `{uat_gate_mode}`

## Process
1. Parse UAT document for test scenarios
2. Identify automatable scenarios (CLI commands, API calls, file checks)
3. Execute each in isolated shell
4. Collect results with stdout/stderr evidence
5. Determine gate status

## Outputs
- Gate result: PASS | FAIL
- Metrics update: `{metrics_folder}/epic-{id}-metrics.yaml`

## Exit Conditions
- PASS: Continue to next epic
- FAIL + blocking=false: Log warning, continue
- FAIL + blocking=true: Halt chain, require intervention

New: step-06-generate-report.md

# Chain Report Generation

## Purpose
Generate comprehensive execution report after chain completion.

## Inputs
- All metrics files: `{metrics_folder}/epic-*-metrics.yaml`
- Chain plan: `{chain_plan_file}`
- Template: `chain-report-template.md`

## Process
1. Load all epic metrics
2. Calculate aggregates (totals, averages, percentages)
3. Build dependency graph visualization
4. Compile issues list
5. Generate token/cost estimates
6. Render report template

## Outputs
- `{chain_report_file}` - Complete execution report

Agent Interaction Model

SM Agent (Orchestrator) - With Self-Healing

User: *epic-chain 1-8

SM: Loading chain plan...
    Found 8 epics, 58 stories
    Dependencies analyzed: sequential with branches

    Starting Epic 1 (Foundation)...
    [Dev → Review → Commit → UAT Gen → Gate Check]
    Epic 1: COMPLETE (7/7 stories, UAT PASS)

    Starting Epic 2 (Event Ingestion)...
    [Dev → Review → Commit → UAT Gen → Gate Check]

    ⚠ UAT GATE FAILED - 2 scenarios failed

    Initiating self-healing loop (attempt 1/2)...
    Generating fix context → docs/sprint-artifacts/uat-fix-context-2-1.md

    [Invoking Barry for quick-dev fix session]
    Barry: Loading fix context...
           Failure 1: API endpoint returning 404
           Failure 2: Missing auth header validation
           Implementing fixes...
           ✓ Fixed route registration in server.ts
           ✓ Added auth middleware check
           Committed: fix(epic-2): UAT fix #1

    Re-validating UAT gate...
    ✓ All scenarios now passing

    Epic 2: COMPLETE (5/5 stories, UAT PASS after 1 fix)

    Starting Epic 3 (Workflow Engine)...
    ...

    Chain Complete.
    Generating execution report...
    Report: docs/sprint-artifacts/chain-execution-report.md

    Summary:
    - 8/8 epics completed
    - 2 epics required self-healing fixes
    - Total fix attempts: 3
    - All UAT gates now passing

UAT Validator Agent (Validation)

User: *uat-validate 1

Quinn: Loading UAT document for Epic 1...
       Found 9 scenarios:
       - 6 automatable
       - 2 semi-automated
       - 1 manual only

       Executing automatable scenarios...

       ✓ Scenario 1: Project Initialization
         Command: npx heimdall --version
         Result: PASS (output: "1.0.0")

       ✓ Scenario 3: Database Migration
         Command: npx heimdall db migrate
         Result: PASS (output contains "initialized successfully")

       ⚠ Scenario 7: Test Email Sending
         Command: npx heimdall test-send --to test@example.com
         Result: SEMI-AUTO (executed, requires inbox verification)
         Message ID: re_abc123

       ○ Scenario 9: Railway Deployment
         Result: SKIPPED (manual only)

       Summary: 6/6 automatable PASSED
       Recommendation: APPROVE (pending manual verification of 3 scenarios)

Self-Healing Loop Example (Failure → Fix → Pass)

Quinn: UAT Gate Check for Epic 2...

       ✓ Scenario 1: Event Ingestion Endpoint
         Command: curl -X POST http://localhost:3000/api/v1/events
         Result: PASS (201 Created)

       ✗ Scenario 2: API Key Authentication
         Command: curl -H "X-API-Key: test" http://localhost:3000/api/v1/events
         Expected: 200 OK with auth header validated
         Actual: 401 Unauthorized
         Error: "Missing authentication middleware"

       ✗ Scenario 3: Events Database Table
         Command: npx heimdall db status
         Expected: "events table: exists"
         Actual: Exit code 1
         Error: "relation 'events' does not exist"

       UAT_GATE_RESULT: FAIL
       CRITICAL_PASSED: 1/3
       BLOCKING_ISSUES: [scenario-2, scenario-3]

       Generating fix context for quick-dev...
       Output: docs/sprint-artifacts/uat-fix-context-2-1.md

---

Barry: Loading fix context for Epic 2 (Attempt 1)...

       Issue 1: API Key Authentication failing
       - Root cause: Auth middleware not registered on route
       - Fix: Add authMiddleware to route handler chain

       Issue 2: Events table missing
       - Root cause: Migration file exists but wasn't run
       - Fix: Ensure migration runs in db:migrate command

       Implementing fixes...
       ✓ Updated packages/api/src/routes/events.ts
       ✓ Updated packages/cli/src/commands/db.ts

       Running self-check...
       ✓ Tests passing

       Committing: fix(epic-2): add auth middleware, fix migration order

---

Quinn: Re-validating UAT Gate for Epic 2...

       ✓ Scenario 1: Event Ingestion Endpoint - PASS
       ✓ Scenario 2: API Key Authentication - PASS (was FAIL)
       ✓ Scenario 3: Events Database Table - PASS (was FAIL)

       UAT_GATE_RESULT: PASS
       CRITICAL_PASSED: 3/3

       Epic 2 approved after 1 fix attempt.

Configuration Options

Default Configuration

# {project-root}/.bmad/bmm/config.yaml additions

epic_chain:
  # UAT Settings
  uat:
    gate_enabled: true
    gate_mode: quick          # quick | full | skip
    gate_blocking: false      # Stop chain on failure?
    timeout_seconds: 30       # Per-scenario timeout

  # Report Settings
  report:
    enabled: true
    format: markdown          # markdown | html | both
    include_token_estimates: true
    include_dependency_graph: true

  # Metrics Settings
  metrics:
    enabled: true
    per_story_timing: true
    track_retries: true

Per-Run Override

# Override gate settings for a specific run
./bmad/scripts/epic-chain.sh 1-8 --uat-gate=full --uat-blocking=true

Summary

This integration provides:

  1. Automated Quality Gates - Verify implementations meet acceptance criteria before proceeding
  2. Self-Healing Fix Loop - Failed UAT automatically triggers quick-dev to fix issues and re-validate
  3. Comprehensive Reporting - Generate detailed execution reports with metrics, timing, fix history, and issues
  4. Flexible Configuration - Adjust gate strictness, retry limits, and escalation behavior per project/run
  5. Clear Traceability - Every test scenario maps back to story acceptance criteria, every fix links to failure
  6. Graceful Degradation - Semi-automated and manual scenarios documented but not blocking

Agent Responsibilities

Agent Role Key Actions
SM (Bob) Chain Orchestrator Runs epic-chain, coordinates phases, triggers fix loops
Quinn UAT Validator Executes scenarios, generates fix context on failure
Barry Quick Dev Fixer Receives fix context, implements targeted fixes, commits

Self-Healing Flow

UAT Fail → Generate Fix Context → Quick Dev Fix → Re-validate → Pass/Retry/Halt

The maximum retry count (default: 2) prevents infinite loops. After max retries, the chain halts and requires human intervention, ensuring issues are surfaced rather than ignored.