feat(epic-execute): add comprehensive quality gates with self-healing

- Add architecture compliance check (per-story) with fix loop
- Add test quality review (per-story) with TEA patterns
- Add requirements traceability (per-epic) with self-healing test generation
- Add --skip-arch, --skip-test-quality, --skip-traceability flags
- Update workflow.md to v2.0 with new flow diagram
- Add step templates for each quality gate phase
- Include improvements doc and execution report

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Caleb 2026-01-26 13:45:13 -06:00
parent 9f532eff65
commit 865b904041
7 changed files with 3105 additions and 35 deletions

1145
bmad_improvements.md Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,308 @@
# Heimdall Customer Management - Epic Chain Execution Report
## Executive Summary
**Project:** Heimdall Customer Management System
**Execution Method:** BMAD Epic Chain (automated AI-driven development)
**Status:** COMPLETE - All 58 stories implemented
| Metric | Value |
|--------|-------|
| Total Epics | 8 |
| Total Stories | 58 |
| Start Time | 1:40 PM CST, January 2, 2026 |
| End Time | ~7:00 AM CST, January 3, 2026 |
| Total Duration | ~17.5 hours |
| Average per Story | ~18 minutes |
---
## Timeline
### Epic Execution Duration
| Epic | Name | Stories | Duration | Status |
|------|------|---------|----------|--------|
| 1 | Foundation, CLI & Deployment | 7 | ~1.5 hours | Complete |
| 2 | Event Ingestion API | 5 | ~1.0 hours | Complete |
| 3 | Workflow Engine & Onboarding | 7 | ~1.5 hours | Complete |
| 4 | Broadcast Scheduling | 6 | 1.6 hours (5812s) | Complete |
| 5 | AI Content Copilot | 9 | 2.9 hours (10269s) | Complete |
| 6 | Build Mode & Templates | 8 | 2.1 hours (7482s) | Complete |
| 7 | Observability & Reporting | 8 | 2.5 hours (8822s) | Complete |
| 8 | Compliance & Suppression | 8 | 1.75 hours (6300s) | Complete |
| **Total** | | **58** | **~17.5 hours** | **100%** |
---
## Dependency Graph
The epics were executed in dependency order:
```
Epic 1 (Foundation)
├── Epic 2 (Event Ingestion) ──┐
│ └── Epic 3 (Workflow) ─┼── Epic 7 (Observability) ── Epic 8 (Compliance)
│ └── Epic 4 (Broadcast)
│ └── Epic 6 (Templates)
└── Epic 5 (AI Copilot) ───────┘
```
### Explicit Dependencies
| Epic | Depends On | Reason |
|------|------------|--------|
| 1 | None | Foundation - no prior dependencies |
| 2 | Epic 1 | Requires Fastify server, Supabase adapter, pg-boss |
| 3 | Epic 1, 2 | Requires events table, event routing, pg-boss scheduler |
| 4 | Epic 1, 3 | Requires scheduler, Supabase API, Resend adapter |
| 5 | Epic 1 | Requires CLI foundation, types package |
| 6 | Epic 1, 3, 5 | Requires templates from E3, context from E5 |
| 7 | Epic 1, 2, 3 | Requires webhook endpoint, email_logs table, send action |
| 8 | Epic 1, 7 | Requires suppression table, webhook processing |
---
## What Was Built
### Epic 1: Foundation, CLI & Deployment Infrastructure (7 stories)
- Turborepo monorepo with `packages/core`, `cli`, `types`, `adapters`
- Supabase adapter with connection pooling
- pg-boss job queue integration
- Resend email adapter foundation
- Railway deployment configuration (Dockerfile, health endpoint)
- Workspace configuration system
**Stories:**
- 1-1: Initialize Monorepo Structure
- 1-2: Workspace Configuration System
- 1-3: Supabase Adapter & Database Schema
- 1-4: Job Queue Integration with pg-boss
- 1-5: Resend Adapter Foundation
- 1-6: Railway Deployment Configuration
- 1-7: Database & Supabase API Configuration
### Epic 2: Event Ingestion API & Core Routing (5 stories)
- `POST /api/v1/events` REST endpoint
- API key authentication
- Events database table with idempotency
- CLI event simulation commands
- Event routing foundation
**Stories:**
- 2-1: Event Ingestion API Endpoint
- 2-2: API Key Authentication
- 2-3: Events Database Table
- 2-4: CLI Event Simulation
- 2-5: Event Routing Foundation
### Epic 3: Workflow Engine & Onboarding Flows (7 stories)
- YAML flow configuration with Zod validation
- Config loader with descriptive error messages
- Executions table with snapshot pattern
- Workflow execution engine
- Relative delay scheduler
- Send email action
- Example flows and templates
**Stories:**
- 3-1: Flow Configuration Schema
- 3-2: Config Loader & Validation
- 3-3: Executions Table & Snapshot Pattern
- 3-4: Workflow Execution Engine
- 3-5: Relative Delay Scheduler
- 3-6: Send Email Action
- 3-7: Example Flows & Templates
### Epic 4: Broadcast Scheduling & Cohort Emails (6 stories)
- Broadcast configuration schema
- Cohort queries via Supabase API
- Absolute schedule execution
- CLI broadcast commands (`heimdall broadcast schedule`)
- Batch execution with retry logic
- Example broadcast configurations
**Stories:**
- 4-1: Broadcast Configuration Schema
- 4-2: Cohort Query via Supabase API
- 4-3: Absolute Schedule Execution
- 4-4: Broadcast CLI Commands
- 4-5: Broadcast Execution & Batching
- 4-6: Example Broadcast Configs
### Epic 5: AI Content Copilot (9 stories)
- Anthropic Claude SDK integration
- `heimdall generate` CLI command
- Prompt configuration system in YAML
- Schema export for AI context (JSON)
- Content refinement commands
- Privacy-safe generation (no PII sent to LLM)
- Conversational context builder with AI-guided Q&A
- Sequence context Q&A
- Context import shortcuts
**Stories:**
- 5-1: Anthropic SDK Integration
- 5-2: Generate Email Content Command
- 5-3: Prompt Configuration
- 5-4: Schema Export for AI Context
- 5-5: Content Refinement Commands
- 5-6: Privacy-Safe Generation (No PII)
- 5-7: Conversational Context Builder
- 5-8: Sequence Context Q&A
- 5-9: Context Import Shortcut
### Epic 6: Build Mode & Template Verification (8 stories)
- React Email template setup
- Template rendering & preview
- Template validation & syntax check
- Test send command (`heimdall test-send`)
- Build all command (`heimdall build`)
- Example templates for AI-assisted development
- Context-aware template generation
- Template regeneration with context updates
**Stories:**
- 6-1: React Email Template Setup
- 6-2: Template Rendering & Preview
- 6-3: Template Validation & Syntax Check
- 6-4: Test Send Command
- 6-5: Build All Command
- 6-6: Example Templates
- 6-7: Context-Aware Template Generation
- 6-8: Template Regeneration
### Epic 7: Observability & Reporting (8 stories)
- Resend webhook endpoint (`POST /api/v1/webhooks/resend`)
- Email logs table
- Webhook event processing
- Immediate failure alerts
- AI-powered weekly roundup reports
- CLI metrics commands
- Webhook configuration CLI
- Configurable report metrics & goals
**Stories:**
- 7-1: Resend Webhook Endpoint
- 7-2: Email Logs Table
- 7-3: Webhook Event Processing
- 7-4: Immediate Failure Alerts
- 7-5: AI-Powered Weekly Roundup
- 7-6: CLI Metrics Commands
- 7-7: Webhook Configuration CLI
- 7-8: Configurable Report Metrics
### Epic 8: Compliance & Suppression Management (8 stories)
- Suppression table
- Automatic unsubscribe handling
- Automatic complaint handling
- Hard bounce suppression
- Pre-send suppression check
- Manual suppression management CLI
- Bulk suppression import
- Unsubscribe link generation
**Stories:**
- 8-1: Suppression Table
- 8-2: Automatic Unsubscribe Handling
- 8-3: Automatic Complaint Handling
- 8-4: Hard Bounce Suppression
- 8-5: Pre-Send Suppression Check
- 8-6: Manual Suppression Management
- 8-7: Bulk Suppression Import
- 8-8: Unsubscribe Link Generation
---
## Estimated Token Usage
Based on typical patterns for AI-driven development:
| Epic | Stories | Est. Calls | Est. Input | Est. Output | Est. Total |
|------|---------|------------|------------|-------------|------------|
| 1 | 7 | 14 | ~112K | ~56K | ~168K |
| 2 | 5 | 10 | ~80K | ~40K | ~120K |
| 3 | 7 | 14 | ~112K | ~56K | ~168K |
| 4 | 6 | 12 | ~96K | ~48K | ~144K |
| 5 | 9 | 18 | ~144K | ~72K | ~216K |
| 6 | 8 | 16 | ~128K | ~64K | ~192K |
| 7 | 8 | 16 | ~128K | ~64K | ~192K |
| 8 | 8 | 16 | ~128K | ~64K | ~192K |
| **Total** | **58** | **116** | **~928K** | **~464K** | **~1.4M** |
### Cost Estimates
| Model | Input Cost | Output Cost | Total |
|-------|------------|-------------|-------|
| Claude Sonnet 3.5 ($3/$15 per 1M) | ~$2.78 | ~$6.96 | ~$9.74 |
| Claude Opus ($15/$75 per 1M) | ~$13.92 | ~$34.80 | ~$48.72 |
*Note: These are rough estimates. Actual usage may vary by 50-200%.*
---
## Issues Encountered
### Script Signaling Mismatch
**Issue:** Stories completed successfully but the dev phase didn't output the exact `IMPLEMENTATION COMPLETE: <story_id>` phrase expected by the script.
**Impact:** 9 stories across epics 4-7 were marked as failed despite successful implementation.
**Resolution:** Manually updated story status from "In Review" or "completed" to "Done".
**Affected Stories:**
- 4-3: Absolute Schedule Execution
- 4-5: Broadcast Execution & Batching
- 5-3: Prompt Configuration
- 5-4: Schema Export for AI Context
- 5-9: Context Import Shortcut
- 6-3: Template Validation & Syntax Check
- 6-7: Context-Aware Template Generation
- 7-7: Webhook Configuration CLI
- 7-8: Configurable Report Metrics
---
## Artifacts Generated
| Artifact | Location | Description |
|----------|----------|-------------|
| Story Files | `docs/stories/` | 58 completed stories with dev & review records |
| UAT Documents | `docs/uat/` | 8 User Acceptance Test documents (one per epic) |
| Epic Files | `docs/epics/` | 8 epic definition files |
| Handoffs | `docs/handoffs/` | Context handoff documents between epics |
| Chain Plan | `docs/sprint-artifacts/chain-plan.yaml` | Execution plan with dependencies |
---
## Next Steps
1. **Review UAT Documents** - Review the 8 UAT documents in `docs/uat/`
2. **Manual Acceptance Testing** - Execute test scenarios from UAT docs
3. **Code Review** - Review generated code for refinements
4. **Integration Testing** - Test cross-epic integrations
5. **Deploy to Staging** - Deploy the complete system to staging environment
---
## Conclusion
The Heimdall Customer Management system was successfully implemented through automated AI-driven development using the BMAD Epic Chain workflow. All 58 stories across 8 epics were completed in approximately 17.5 hours of execution time.
The system provides a complete customer management and email automation platform with:
- Event-driven architecture
- Workflow automation engine
- Scheduled broadcast capabilities
- AI-powered content generation
- Template management system
- Observability and reporting
- Compliance and suppression management

View File

@ -12,6 +12,9 @@
# --verbose Show detailed output
# --start-from ID Start from a specific story (e.g., 31-2)
# --skip-done Skip stories with Status: Done
# --skip-arch Skip architecture compliance check
# --skip-test-quality Skip test quality review
# --skip-traceability Skip traceability check (not recommended)
#
set -e
@ -60,6 +63,14 @@ WORKFLOW_EXECUTOR="$CORE_TASKS_DIR/workflow.xml"
UAT_STEP_TEMPLATE="$WORKFLOWS_DIR/epic-execute/steps/step-04-generate-uat.md"
UAT_DOC_TEMPLATE="$WORKFLOWS_DIR/epic-execute/templates/uat-template.md"
# New Quality Gate Steps
ARCH_COMPLIANCE_STEP="$WORKFLOWS_DIR/epic-execute/steps/step-02b-arch-compliance.md"
TEST_QUALITY_STEP="$WORKFLOWS_DIR/epic-execute/steps/step-03b-test-quality.md"
TRACEABILITY_STEP="$WORKFLOWS_DIR/epic-execute/steps/step-03c-traceability.md"
# Traceability output directory
TRACEABILITY_DIR="$PROJECT_ROOT/docs/sprint-artifacts/traceability"
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
@ -355,6 +366,9 @@ PARALLEL=false
VERBOSE=false
START_FROM=""
SKIP_DONE=false
SKIP_ARCH=false
SKIP_TEST_QUALITY=false
SKIP_TRACEABILITY=false
while [[ $# -gt 0 ]]; do
case $1 in
@ -386,6 +400,18 @@ while [[ $# -gt 0 ]]; do
SKIP_DONE=true
shift
;;
--skip-arch)
SKIP_ARCH=true
shift
;;
--skip-test-quality)
SKIP_TEST_QUALITY=true
shift
;;
--skip-traceability)
SKIP_TRACEABILITY=true
shift
;;
-*)
echo "Unknown option: $1"
exit 1
@ -408,6 +434,9 @@ if [ -z "$EPIC_ID" ]; then
echo " --verbose Detailed output"
echo " --start-from ID Start from a specific story (e.g., 31-2)"
echo " --skip-done Skip stories with Status: Done"
echo " --skip-arch Skip architecture compliance check"
echo " --skip-test-quality Skip test quality review"
echo " --skip-traceability Skip traceability check (not recommended)"
exit 1
fi
@ -1001,11 +1030,508 @@ Address all review findings now. This is attempt $attempt_num of 3."
# Maximum number of fix attempts before giving up
MAX_FIX_ATTEMPTS=3
MAX_ARCH_FIX_ATTEMPTS=2
MAX_TEST_QUALITY_FIX_ATTEMPTS=2
MAX_TRACEABILITY_FIX_ATTEMPTS=3
# Global variable to store arch violations for fix loop
LAST_ARCH_VIOLATIONS=""
# Global variable to store test quality issues for fix loop
LAST_TEST_QUALITY_ISSUES=""
# Global variable to store traceability gaps for fix loop
LAST_TRACEABILITY_GAPS=""
execute_arch_compliance_phase() {
local story_file="$1"
local story_id=$(basename "$story_file" .md)
# Reset violations
LAST_ARCH_VIOLATIONS=""
log ">>> ARCH COMPLIANCE: $story_id (fresh context)"
# Load architecture file
local arch_file=""
for search_path in "$PROJECT_ROOT/docs/architecture.md" "$PROJECT_ROOT/docs/architecture/architecture.md" "$PROJECT_ROOT/architecture.md"; do
if [ -f "$search_path" ]; then
arch_file="$search_path"
break
fi
done
if [ -z "$arch_file" ]; then
log_warn "No architecture.md found - skipping compliance check"
return 0
fi
local arch_contents=$(cat "$arch_file")
local story_contents=$(cat "$story_file")
# Load step template if available
local step_template=""
if [ -f "$ARCH_COMPLIANCE_STEP" ]; then
step_template=$(cat "$ARCH_COMPLIANCE_STEP")
fi
local arch_prompt="You are an Architecture Compliance Validator executing a BMAD compliance check.
## Your Task
Validate architecture compliance for story: $story_id
You are checking the staged changes against the project's established architecture patterns.
This is a TARGETED CHECK - focus only on structural/architectural issues, not code quality.
### CRITICAL AUTOMATION RULES
- Do NOT pause for user confirmation
- Execute the full compliance check
- Fix HIGH severity violations automatically
- Document MEDIUM and LOW violations
## Architecture Reference
<architecture>
$arch_contents
</architecture>
## Story Context
<story>
$story_contents
</story>
## Staged Changes
Run: git diff --staged --name-only
Then for each changed file: git diff --staged
## Compliance Checklist
### 1. Layer Violations
- UI/Presentation only handles display logic
- Business logic in service/domain layer
- Data access confined to repository/data layer
- Controllers only orchestrate
### 2. Dependency Direction
- No circular dependencies
- Lower layers don't import from higher layers
- Core doesn't depend on infrastructure
### 3. Pattern Conformance
- State management uses project's standard
- Error handling follows conventions
- API calls use established patterns
### 4. Module Boundaries
- Feature code in correct module
- No cross-module imports bypassing interfaces
### 5. File Organization
- Files in correct directories
- Naming follows conventions
## Fix Policy
| Severity | Action |
|----------|--------|
| HIGH | Fix immediately |
| MEDIUM | Fix if possible, otherwise document |
| LOW | Document only |
## Completion Signals
If compliant (no HIGH/MEDIUM violations or all fixed):
Output: ARCH COMPLIANT: $story_id
Or: ARCH COMPLIANT WITH FIXES: $story_id - Fixed N violations
If HIGH violations cannot be fixed:
First output:
\`\`\`
ARCH VIOLATIONS START
- [HIGH] Description (file:line)
- [MEDIUM] Description (file:line)
ARCH VIOLATIONS END
\`\`\`
Then: ARCH VIOLATIONS: $story_id - [summary]
## Begin Execution
Check architecture compliance now. Stage any fixes with: git add -A"
if [ "$DRY_RUN" = true ]; then
echo "[DRY RUN] Would execute architecture compliance check for $story_id"
return 0
fi
local result
result=$(claude --dangerously-skip-permissions -p "$arch_prompt" 2>&1) || true
echo "$result" >> "$LOG_FILE"
if echo "$result" | grep -q "ARCH COMPLIANT"; then
log_success "Architecture compliant: $story_id"
return 0
elif echo "$result" | grep -q "ARCH VIOLATIONS"; then
log_error "Architecture violations found: $story_id"
echo "$result" | grep "ARCH VIOLATIONS"
# Extract violations for fix loop
LAST_ARCH_VIOLATIONS=$(echo "$result" | sed -n '/ARCH VIOLATIONS START/,/ARCH VIOLATIONS END/p' | grep -E '^\s*-\s*\[(HIGH|MEDIUM)\]' || true)
if [ -n "$LAST_ARCH_VIOLATIONS" ]; then
log "Captured architecture violations for fix loop"
fi
return 1
else
log_warn "Architecture check did not complete cleanly: $story_id"
return 0 # Don't block on unclear result
fi
}
execute_test_quality_phase() {
local story_file="$1"
local story_id=$(basename "$story_file" .md)
# Reset issues
LAST_TEST_QUALITY_ISSUES=""
log ">>> TEST QUALITY: $story_id (fresh context)"
local story_contents=$(cat "$story_file")
local quality_prompt="You are a Test Architect (TEA) executing a test quality review.
## Your Task
Review the tests created for story: $story_id
Focus on test maintainability, determinism, isolation, and flakiness prevention.
### CRITICAL AUTOMATION RULES
- Do NOT pause for user confirmation
- Execute the full quality review
- Fix CRITICAL and HIGH issues automatically
- Document MEDIUM and LOW issues
## Story Context
<story>
$story_contents
</story>
## Test Files to Review
Find test files from Dev Agent Record:
\`\`\`bash
git diff --staged --name-only | grep -E '\\.(spec|test)\\.(ts|js|tsx|jsx)\$'
\`\`\`
## Quality Criteria
### 1. BDD Format (Given-When-Then)
### 2. Test ID Conventions ({story_id}-E2E-001, etc.)
### 3. Hard Waits Detection (no sleep(), waitForTimeout())
### 4. Determinism (no conditionals, no random values)
### 5. Isolation & Cleanup (afterEach hooks, no shared state)
### 6. Explicit Assertions (every test has expect/assert)
### 7. Test Length (≤300 lines)
### 8. Fixture Patterns
### 9. Data Factories (no hardcoded test data)
### 10. Network-First Pattern (intercept before navigate)
### 11. Flakiness Patterns
## Quality Score
Starting: 100
- Critical violations: -10 each
- High violations: -5 each
- Medium violations: -2 each
- Low violations: -1 each
- Bonus for best practices: +5 each
## Fix Policy
| Severity | Action |
|----------|--------|
| CRITICAL | Must fix |
| HIGH | Fix if total issues > 3 |
| MEDIUM | Document |
| LOW | Document |
## Completion Signals
If quality approved (score ≥70, no critical/high remaining):
Output: TEST QUALITY APPROVED: $story_id - Score: N/100
Or: TEST QUALITY APPROVED WITH FIXES: $story_id - Score: N/100, Fixed M issues
If quality concerns (score 60-69):
Output: TEST QUALITY CONCERNS: $story_id - Score: N/100
If quality failed (score <60 or unfixable critical issues):
First output:
\`\`\`
TEST QUALITY ISSUES START
- [CRITICAL] Description (file:line)
- [HIGH] Description (file:line)
TEST QUALITY ISSUES END
\`\`\`
Then: TEST QUALITY FAILED: $story_id - Score: N/100
## Begin Execution
Review test quality now. Stage any fixes with: git add -A"
if [ "$DRY_RUN" = true ]; then
echo "[DRY RUN] Would execute test quality review for $story_id"
return 0
fi
local result
result=$(claude --dangerously-skip-permissions -p "$quality_prompt" 2>&1) || true
echo "$result" >> "$LOG_FILE"
if echo "$result" | grep -q "TEST QUALITY APPROVED"; then
log_success "Test quality approved: $story_id"
return 0
elif echo "$result" | grep -q "TEST QUALITY CONCERNS"; then
log_warn "Test quality concerns: $story_id"
return 0 # Concerns don't block
elif echo "$result" | grep -q "TEST QUALITY FAILED"; then
log_error "Test quality failed: $story_id"
echo "$result" | grep "TEST QUALITY FAILED"
# Extract issues for fix loop
LAST_TEST_QUALITY_ISSUES=$(echo "$result" | sed -n '/TEST QUALITY ISSUES START/,/TEST QUALITY ISSUES END/p' | grep -E '^\s*-\s*\[(CRITICAL|HIGH)\]' || true)
if [ -n "$LAST_TEST_QUALITY_ISSUES" ]; then
log "Captured test quality issues for fix loop"
fi
return 1
else
log_warn "Test quality check did not complete cleanly: $story_id"
return 0 # Don't block on unclear result
fi
}
execute_traceability_phase() {
log ">>> TRACEABILITY CHECK: Epic $EPIC_ID (fresh context)"
# Reset gaps
LAST_TRACEABILITY_GAPS=""
# Ensure output directory exists
mkdir -p "$TRACEABILITY_DIR"
local epic_contents=$(cat "$EPIC_FILE")
# Build story contents block
local all_stories=""
for story_file in "${STORIES[@]}"; do
local story_id=$(basename "$story_file" .md)
all_stories+="
<story id=\"$story_id\">
$(cat "$story_file")
</story>
"
done
local story_count=${#STORIES[@]}
local trace_prompt="You are a Test Architect (TEA) executing requirements traceability analysis.
## Your Task
Generate a traceability matrix for Epic: $EPIC_ID
Map ALL acceptance criteria from ALL stories to their implementing tests.
Identify coverage gaps and determine if the epic is ready for UAT.
### CRITICAL AUTOMATION RULES
- Do NOT pause for user confirmation
- Execute the full traceability analysis
- Generate the traceability matrix document
- If gaps found, output them in structured format for auto-fix
## Epic Definition
<epic>
$epic_contents
</epic>
## Completed Stories ($story_count total)
$all_stories
## Phase 1: Discover Tests
\`\`\`bash
find . -type f \\( -name \"*.spec.ts\" -o -name \"*.test.ts\" -o -name \"*.spec.js\" -o -name \"*.test.js\" \\) | head -100
\`\`\`
## Phase 2: Map Criteria to Tests
For each acceptance criterion:
- Search for test IDs, describe blocks
- Classify: FULL, PARTIAL, NONE, UNIT-ONLY, INTEGRATION-ONLY
## Coverage Thresholds
| Priority | Required | Gate Impact |
|----------|----------|-------------|
| P0 | 100% | FAIL if not met |
| P1 | ≥90% | CONCERNS if 80-89%, FAIL if <80% |
| P2 | ≥80% | Advisory |
| P3 | None | Advisory |
## Phase 3: Gap Analysis
Identify:
- Critical gaps (P0 without coverage)
- High priority gaps (P1 < 90%)
- Medium priority gaps (P2 < 80%)
## Phase 4: Generate Deliverables
Save traceability matrix to: $TRACEABILITY_DIR/epic-${EPIC_ID}-traceability.md
## Completion Signals
If PASS (P0=100%, P1≥90%):
Output: TRACEABILITY PASS: $EPIC_ID - P0: N%, P1: M%, Overall: O%
If CONCERNS (P0=100%, P1 80-89%):
Output: TRACEABILITY CONCERNS: $EPIC_ID - P1 at N% (below 90%)
If FAIL (P0<100% or P1<80%):
First output gaps for self-healing:
\`\`\`
TRACEABILITY GAPS START
GAP: {story_id}|AC-{n}|{priority}|{description}|{recommended_test_id}|{test_level}
SPEC:
Given: {precondition}
When: {action}
Then: {expected result}
GAP: ...
TRACEABILITY GAPS END
\`\`\`
Then: TRACEABILITY FAIL: $EPIC_ID - P0: N%, P1: M%, X critical gaps
## Begin Execution
Analyze traceability now."
if [ "$DRY_RUN" = true ]; then
echo "[DRY RUN] Would execute traceability analysis for Epic $EPIC_ID"
return 0
fi
local result
result=$(claude --dangerously-skip-permissions -p "$trace_prompt" 2>&1) || true
echo "$result" >> "$LOG_FILE"
if echo "$result" | grep -q "TRACEABILITY PASS"; then
log_success "Traceability passed: Epic $EPIC_ID"
return 0
elif echo "$result" | grep -q "TRACEABILITY CONCERNS"; then
log_warn "Traceability concerns: Epic $EPIC_ID"
return 0 # Concerns don't block
elif echo "$result" | grep -q "TRACEABILITY FAIL"; then
log_error "Traceability failed: Epic $EPIC_ID"
echo "$result" | grep "TRACEABILITY FAIL"
# Extract gaps for self-healing
LAST_TRACEABILITY_GAPS=$(echo "$result" | sed -n '/TRACEABILITY GAPS START/,/TRACEABILITY GAPS END/p' || true)
if [ -n "$LAST_TRACEABILITY_GAPS" ]; then
log "Captured traceability gaps for self-healing"
fi
return 1
else
log_warn "Traceability check did not complete cleanly"
return 0 # Don't block on unclear result
fi
}
execute_traceability_fix_phase() {
local gaps="$1"
local attempt_num="$2"
log ">>> TRACEABILITY FIX: Epic $EPIC_ID (attempt $attempt_num, generating missing tests)"
local fix_prompt="You are a Test Architect generating tests to close coverage gaps.
## Your Task
Generate missing tests for Epic: $EPIC_ID (attempt $attempt_num of $MAX_TRACEABILITY_FIX_ATTEMPTS)
### CRITICAL RULES
- Generate ONLY the tests specified in the gaps
- Follow existing test patterns in the codebase
- Run each test to verify it passes
- Stage changes: git add -A
## Gaps to Address
$gaps
## Instructions
For each GAP:
1. Parse the specification (Given/When/Then)
2. Create the test file if needed
3. Implement the test following the spec
4. Use existing patterns from codebase
5. Run the test
6. Stage changes
## Completion Signals
If all tests generated:
Output: TEST GENERATION COMPLETE: Generated N tests
If partial success:
Output: TEST GENERATION PARTIAL: Generated N of M tests - [reason]
## Begin Execution
Generate missing tests now."
if [ "$DRY_RUN" = true ]; then
echo "[DRY RUN] Would generate missing tests for Epic $EPIC_ID (attempt $attempt_num)"
return 0
fi
local result
result=$(claude --dangerously-skip-permissions -p "$fix_prompt" 2>&1) || true
echo "$result" >> "$LOG_FILE"
if echo "$result" | grep -q "TEST GENERATION COMPLETE"; then
log_success "Test generation complete for Epic $EPIC_ID"
return 0
elif echo "$result" | grep -q "TEST GENERATION PARTIAL"; then
log_warn "Partial test generation for Epic $EPIC_ID"
return 1
else
log_error "Test generation did not complete cleanly"
return 1
fi
}
execute_story_with_fix_loop() {
local story_file="$1"
local story_id=$(basename "$story_file" .md)
local fix_attempt=0
local arch_fix_attempt=0
local test_quality_fix_attempt=0
local needs_fixes=false
# DEV PHASE (Context 1)
@ -1014,13 +1540,43 @@ execute_story_with_fix_loop() {
return 1
fi
# ARCHITECTURE COMPLIANCE CHECK (Context 2) - Per Story
if [ "$SKIP_ARCH" = false ]; then
while true; do
if execute_arch_compliance_phase "$story_file"; then
log_success "Architecture compliant: $story_id"
break
fi
# Check if we have violations to fix
if [ -z "$LAST_ARCH_VIOLATIONS" ]; then
log_warn "Arch check unclear, proceeding anyway"
break
fi
((arch_fix_attempt++))
if [ $arch_fix_attempt -gt $MAX_ARCH_FIX_ATTEMPTS ]; then
log_error "Max arch fix attempts ($MAX_ARCH_FIX_ATTEMPTS) reached for $story_id"
add_metrics_issue "$story_id" "arch_violations" "Architecture violations after $MAX_ARCH_FIX_ATTEMPTS attempts"
# Don't fail the story, proceed with violations documented
break
fi
log_warn "Arch violations found, attempting fix $arch_fix_attempt of $MAX_ARCH_FIX_ATTEMPTS"
# Use the regular fix phase with arch context
if ! execute_fix_phase "$story_file" "$LAST_ARCH_VIOLATIONS" "$arch_fix_attempt"; then
log_warn "Arch fix incomplete, continuing..."
fi
done
fi
# REVIEW + FIX LOOP
while true; do
# REVIEW PHASE (Fresh Context)
if execute_review_phase "$story_file"; then
# Review passed - we're done
# Review passed - proceed to test quality
log_success "Story passed review: $story_id"
return 0
break
fi
# Review failed - check if we have findings to fix
@ -1055,6 +1611,38 @@ execute_story_with_fix_loop() {
# Loop back to review phase to verify fixes
log "Re-running review after fix attempt $fix_attempt..."
done
# TEST QUALITY REVIEW (Fresh Context) - Per Story
if [ "$SKIP_TEST_QUALITY" = false ]; then
while true; do
if execute_test_quality_phase "$story_file"; then
log_success "Test quality approved: $story_id"
break
fi
# Check if we have issues to fix
if [ -z "$LAST_TEST_QUALITY_ISSUES" ]; then
log_warn "Test quality check unclear, proceeding anyway"
break
fi
((test_quality_fix_attempt++))
if [ $test_quality_fix_attempt -gt $MAX_TEST_QUALITY_FIX_ATTEMPTS ]; then
log_warn "Max test quality fix attempts ($MAX_TEST_QUALITY_FIX_ATTEMPTS) reached for $story_id"
add_metrics_issue "$story_id" "test_quality_concerns" "Test quality issues after $MAX_TEST_QUALITY_FIX_ATTEMPTS attempts"
# Don't fail the story, proceed with concerns documented
break
fi
log_warn "Test quality issues found, attempting fix $test_quality_fix_attempt of $MAX_TEST_QUALITY_FIX_ATTEMPTS"
# Use the regular fix phase with test quality context
if ! execute_fix_phase "$story_file" "$LAST_TEST_QUALITY_ISSUES" "$test_quality_fix_attempt"; then
log_warn "Test quality fix incomplete, continuing..."
fi
done
fi
return 0
}
commit_story() {
@ -1283,7 +1871,53 @@ for story_file in "${STORIES[@]}"; do
done
# =============================================================================
# UAT Generation (Context 3 - Fresh)
# Traceability Check (Per-Epic, with Self-Healing)
# =============================================================================
if [ "$SKIP_TRACEABILITY" = false ]; then
echo ""
log "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
log "Requirements Traceability Check"
log "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
trace_fix_attempt=0
while true; do
if execute_traceability_phase; then
log_success "Traceability check passed for Epic $EPIC_ID"
break
fi
# Check if we have gaps to fix
if [ -z "$LAST_TRACEABILITY_GAPS" ]; then
log_warn "Traceability check unclear, proceeding to UAT"
break
fi
((trace_fix_attempt++))
if [ $trace_fix_attempt -gt $MAX_TRACEABILITY_FIX_ATTEMPTS ]; then
log_warn "Max traceability fix attempts ($MAX_TRACEABILITY_FIX_ATTEMPTS) reached"
add_metrics_issue "epic-$EPIC_ID" "traceability_gaps" "Coverage gaps remain after $MAX_TRACEABILITY_FIX_ATTEMPTS attempts"
# Don't fail the epic, proceed with gaps documented
break
fi
log_warn "Traceability gaps found, generating missing tests (attempt $trace_fix_attempt of $MAX_TRACEABILITY_FIX_ATTEMPTS)"
if ! execute_traceability_fix_phase "$LAST_TRACEABILITY_GAPS" "$trace_fix_attempt"; then
log_warn "Test generation incomplete, continuing..."
fi
# Commit any generated tests
if [ "$NO_COMMIT" = false ] && [ "$DRY_RUN" = false ]; then
git add -A
git commit -m "test(epic-$EPIC_ID): generate missing tests for traceability (attempt $trace_fix_attempt)" 2>/dev/null || true
fi
log "Re-running traceability check..."
done
fi
# =============================================================================
# UAT Generation (Fresh Context)
# =============================================================================
echo ""
@ -1316,10 +1950,11 @@ echo " Completed: $COMPLETED"
echo " Failed: $FAILED"
echo ""
echo " Deliverables:"
echo " - Stories: $STORIES_DIR/"
echo " - UAT: $UAT_DIR/epic-${EPIC_ID}-uat.md"
echo " - Metrics: $METRICS_FILE"
echo " - Log: $LOG_FILE"
echo " - Stories: $STORIES_DIR/"
echo " - UAT: $UAT_DIR/epic-${EPIC_ID}-uat.md"
echo " - Traceability: $TRACEABILITY_DIR/epic-${EPIC_ID}-traceability.md"
echo " - Metrics: $METRICS_FILE"
echo " - Log: $LOG_FILE"
echo ""
if [ $FAILED -gt 0 ]; then

View File

@ -0,0 +1,250 @@
# Step 2b: Architecture Compliance Check (Per-Story)
## Context Isolation
**IMPORTANT**: This step executes in a fresh Claude context after the dev phase completes but before code review. It validates that the implementation follows architectural constraints before detailed review begins.
## Objective
Verify that the staged implementation follows the project's established architecture patterns, module boundaries, and dependency rules. Catch structural violations early before they compound across stories.
## Inputs
- `story_id`: The story being validated
- `story_file`: Path to story markdown file (contains Dev Agent Record from dev phase)
## Validation Categories
| Category | What It Catches | Severity if Failed |
|----------|-----------------|-------------------|
| Layer violations | Business logic in UI, DB calls from controllers | HIGH |
| Dependency direction | Circular dependencies, wrong import directions | HIGH |
| Pattern conformance | Using wrong state management, deviating from established patterns | MEDIUM |
| Module boundaries | Features leaking across module boundaries | MEDIUM |
| File organization | Files in wrong directories, naming convention violations | LOW |
## Prompt Template
```
You are an Architecture Compliance Validator executing a BMAD compliance check.
## Your Task
Validate architecture compliance for story: {story_id}
You are checking the staged changes against the project's established architecture patterns.
This is a TARGETED CHECK - focus only on structural/architectural issues, not code quality.
## Story Context
<story>
{story_file_contents}
</story>
## Architecture Reference
Read and understand the project architecture:
<architecture>
{architecture_file_contents}
</architecture>
## Staged Changes
Run this command and analyze the output:
```bash
git diff --staged --name-only
```
Then for each changed file, examine the changes:
```bash
git diff --staged
```
## Compliance Checklist
### 1. Layer Violations
Check that code respects architectural layers:
- [ ] UI/Presentation layer only handles display logic
- [ ] Business logic is in appropriate service/domain layer
- [ ] Data access is confined to repository/data layer
- [ ] Controllers/routes only orchestrate, don't contain business logic
- [ ] No direct database calls from UI components
### 2. Dependency Direction
Verify dependencies flow in the correct direction:
- [ ] No circular dependencies between modules
- [ ] Lower layers don't import from higher layers
- [ ] Shared utilities don't depend on feature-specific code
- [ ] Core/domain doesn't depend on infrastructure
### 3. Pattern Conformance
Ensure implementation follows established patterns:
- [ ] State management uses project's standard approach
- [ ] Error handling follows project conventions
- [ ] API calls use established client/service patterns
- [ ] Authentication/authorization uses project's auth system
- [ ] Configuration follows project's config management
### 4. Module Boundaries
Validate feature isolation:
- [ ] Feature code is in correct module directory
- [ ] No cross-module imports that bypass public interfaces
- [ ] Shared types are in shared/common locations
- [ ] Feature-specific code doesn't leak to unrelated modules
### 5. File Organization
Check structural conventions:
- [ ] Files are in correct directories per architecture
- [ ] File naming follows project conventions
- [ ] Test files are alongside or in standard test directories
- [ ] No orphaned files in wrong locations
## Issue Collection
Compile all violations found:
```markdown
### Architecture Violations Found
| # | Category | Description | Severity | File:Line | Fixable |
|---|----------|-------------|----------|-----------|---------|
| 1 | Layer | [description] | HIGH/MEDIUM/LOW | path:123 | Yes/No |
| 2 | Dependency | [description] | HIGH/MEDIUM/LOW | path:456 | Yes/No |
```
Count totals:
- HIGH: {count}
- MEDIUM: {count}
- LOW: {count}
- TOTAL: {count}
## Fix Policy
Architecture violations are addressed before code review to prevent wasted effort:
| Severity | Action |
|----------|--------|
| **HIGH** | Must fix before proceeding to review |
| **MEDIUM** | Fix if possible, otherwise document for review phase |
| **LOW** | Document only, review phase will handle |
## Fixing Violations
For HIGH severity violations:
1. Make the structural change (move code, fix imports, etc.)
2. Run tests to verify the fix doesn't break functionality
3. Stage the changes: `git add -A`
4. Document the fix in the issue table
## Completion Signals
### COMPLIANT if:
- No HIGH severity violations
- No MEDIUM severity violations (or all fixed)
Output: `ARCH COMPLIANT: {story_id}`
or: `ARCH COMPLIANT WITH FIXES: {story_id} - Fixed {n} violations`
### VIOLATIONS FOUND if:
- HIGH severity violations that cannot be fixed without major rework
1. Output the violations block:
```
ARCH VIOLATIONS START
- [HIGH] Description of violation 1 (file:line)
- [HIGH] Description of violation 2
- [MEDIUM] Description of violation 3
ARCH VIOLATIONS END
```
2. Output: `ARCH VIOLATIONS: {story_id} - {summary}`
## Example Violations
### Layer Violation (HIGH)
```typescript
// ❌ BAD: UI component making direct database call
// src/components/UserProfile.tsx
import { db } from '../database/connection';
const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
// ✅ GOOD: UI uses service layer
import { userService } from '../services/user-service';
const user = await userService.getUserById(id);
```
### Dependency Direction (HIGH)
```typescript
// ❌ BAD: Core domain importing from infrastructure
// src/domain/order.ts
import { sendEmail } from '../infrastructure/email-client';
// ✅ GOOD: Core domain uses interface, infrastructure implements
// src/domain/order.ts
import type { NotificationService } from './interfaces';
```
### Pattern Conformance (MEDIUM)
```typescript
// ❌ BAD: Using fetch directly when project uses axios client
const response = await fetch('/api/users');
// ✅ GOOD: Using established API client
import { apiClient } from '../lib/api-client';
const response = await apiClient.get('/users');
```
### Module Boundary (MEDIUM)
```typescript
// ❌ BAD: Feature importing internal from another feature
// src/features/orders/components/OrderForm.tsx
import { validateEmail } from '../../users/utils/validation'; // internal util
// ✅ GOOD: Using shared utility or feature's public interface
import { validateEmail } from '../../../shared/validation';
// or
import { UserValidation } from '../../users'; // public export
```
## Notes
- This check happens BEFORE detailed code review to catch structural issues early
- Architectural violations are often harder to fix after more code is built on top
- The goal is to maintain architectural integrity across the epic, not just individual stories
- When in doubt about architecture rules, reference architecture.md and existing patterns
```
## Orchestration Integration
```bash
# Fresh context - focused only on architecture compliance
claude -p "$(cat step-02b-arch-compliance.md | envsubst)"
```
## Integration with Fix Loop
If violations are found:
1. Violations are passed to a fix phase (similar to code review fix loop)
2. Fix phase addresses HIGH violations
3. Re-run compliance check
4. Max 2 attempts before escalating to human
## Success Criteria
Phase complete when:
- All HIGH severity violations resolved
- Changes are staged in git
- ARCH COMPLIANT signal output

View File

@ -0,0 +1,314 @@
# Step 3b: Test Quality Review (Per-Story)
## Context Isolation
**IMPORTANT**: This step executes in a fresh Claude context after code review passes. It validates that the tests written for this story meet quality standards before moving to the next story.
## Objective
Review the tests created during the dev phase using TEA's test quality criteria. Ensure tests are maintainable, deterministic, isolated, and not flaky. This prevents accumulation of low-quality tests across the epic.
## Inputs
- `story_id`: The story being validated
- `story_file`: Path to story markdown file (contains Dev Agent Record with test list)
## Integration with testarch-test-review
This step applies the full `testarch-test-review` workflow to the tests created for this story. It uses TEA's knowledge base of best practices for:
- Fixture architecture
- Network-first safeguards
- Data factories
- Determinism and isolation
- Flakiness prevention
## Prompt Template
```
You are a Test Architect (TEA) executing a test quality review for a BMAD story.
## Your Task
Review the tests created for story: {story_id}
You are validating test quality AFTER code review has passed. Focus on test maintainability,
determinism, isolation, and flakiness prevention.
## Story Context
<story>
{story_file_contents}
</story>
The Dev Agent Record in the story lists tests added:
- Locate these test files
- Review each against quality criteria
## Test Files to Review
Based on the Dev Agent Record, find and review these test files:
```bash
# List test files that were added/modified in this story
git diff --staged --name-only | grep -E '\.(spec|test)\.(ts|js|tsx|jsx)$'
```
## Quality Criteria (from testarch-test-review)
### 1. BDD Format (Given-When-Then)
- ✅ PASS: Tests use clear Given-When-Then structure
- ⚠️ WARN: Some structure but not explicit
- ❌ FAIL: No clear structure, intent hard to understand
### 2. Test ID Conventions
- ✅ PASS: Test IDs present (e.g., `{story_id}-E2E-001`, `{story_id}-UNIT-001`)
- ⚠️ WARN: Some IDs missing
- ❌ FAIL: No test IDs, can't trace to requirements
### 3. Hard Waits Detection
- ✅ PASS: No hard waits (no `sleep()`, `waitForTimeout()`, hardcoded delays)
- ⚠️ WARN: Hard waits with justification comments
- ❌ FAIL: Hard waits without justification (flakiness risk)
**Patterns to detect:**
- `sleep(1000)`, `setTimeout()`, `delay()`
- `page.waitForTimeout(5000)` without reason
- `await new Promise(resolve => setTimeout(resolve, X))`
### 4. Determinism
- ✅ PASS: Tests are deterministic (no conditionals controlling flow, no random values)
- ⚠️ WARN: Some conditionals with justification
- ❌ FAIL: Tests use if/else, try/catch abuse, Math.random()
### 5. Isolation & Cleanup
- ✅ PASS: Tests clean up resources, no shared state, can run in any order
- ⚠️ WARN: Some cleanup gaps but isolated enough
- ❌ FAIL: Tests share state, depend on execution order
**Check for:**
- afterEach/afterAll cleanup hooks
- No global variable mutation
- Database/API state cleanup
- Test data deletion
### 6. Explicit Assertions
- ✅ PASS: Every test has explicit assertions (expect, assert, toHaveText)
- ⚠️ WARN: Some tests rely on implicit waits
- ❌ FAIL: Missing assertions, tests don't verify behavior
### 7. Test Length
- ✅ PASS: Test file ≤200 lines (ideal), ≤300 lines (acceptable)
- ⚠️ WARN: 301-500 lines (consider splitting)
- ❌ FAIL: >500 lines (too large, maintainability risk)
### 8. Test Duration (estimated)
- ✅ PASS: Individual tests estimated ≤90 seconds
- ⚠️ WARN: Some tests 90-180 seconds
- ❌ FAIL: Tests >180 seconds (too slow)
### 9. Fixture Patterns
- ✅ PASS: Uses fixtures for common setup
- ⚠️ WARN: Some fixtures, some repetition
- ❌ FAIL: No fixtures, tests repeat setup code
### 10. Data Factories
- ✅ PASS: Uses factory functions with overrides
- ⚠️ WARN: Some factories, some hardcoded data
- ❌ FAIL: Hardcoded test data, magic strings/numbers
### 11. Network-First Pattern (for E2E/Integration)
- ✅ PASS: Route interception BEFORE navigation
- ⚠️ WARN: Some routes correct, others after navigation
- ❌ FAIL: Route interception after navigation (race conditions)
### 12. Flakiness Patterns
- ✅ PASS: No known flaky patterns
- ⚠️ WARN: Some potential flaky patterns
- ❌ FAIL: Multiple flaky patterns detected
**Detect:**
- Tight timeouts (e.g., `{ timeout: 1000 }`)
- Race conditions
- Timing-dependent assertions
- Retry logic hiding flakiness
## Quality Score Calculation
```
Starting Score: 100
Critical Violations (each): -10 points
- Hard waits without justification
- Missing assertions
- Race conditions
- Shared state
High Violations (each): -5 points
- Missing test IDs
- No BDD structure
- Hardcoded data
- Missing fixtures
Medium Violations (each): -2 points
- Long test files (>300 lines)
- Missing priority markers
- Some conditionals
Low Violations (each): -1 point
- Minor style issues
- Incomplete cleanup
Bonus Points:
- Excellent BDD structure: +5
- Comprehensive fixtures: +5
- Network-first pattern: +5
- Perfect isolation: +5
- All test IDs present: +5
Quality Score: max(0, min(100, Starting Score - Violations + Bonus))
```
## Issue Collection
```markdown
### Test Quality Issues
| # | Criterion | Description | Severity | File:Line | Fixable |
|---|-----------|-------------|----------|-----------|---------|
| 1 | Hard Wait | [description] | HIGH | path:123 | Yes |
| 2 | Isolation | [description] | MEDIUM | path:456 | Yes |
**Quality Score**: {score}/100 ({grade})
```
## Fix Policy
| Severity | Action |
|----------|--------|
| **Critical (P0)** | Must fix - these cause flakiness |
| **High (P1)** | Fix if total issues > 3 |
| **Medium (P2)** | Document for future improvement |
| **Low (P3)** | Document only |
## Fixing Issues
For Critical and High issues:
1. Make the test improvement
2. Run the test to verify it still passes
3. Stage the changes: `git add -A`
4. Document the fix
Example fixes:
### Hard Wait → Explicit Wait
```typescript
// ❌ BAD
await page.waitForTimeout(2000);
await expect(locator).toBeVisible();
// ✅ GOOD
await expect(locator).toBeVisible({ timeout: 10000 });
```
### Missing Assertion
```typescript
// ❌ BAD
await page.click('button');
// test ends without checking result
// ✅ GOOD
await page.click('button');
await expect(page.locator('.success-message')).toBeVisible();
```
### Hardcoded Data → Factory
```typescript
// ❌ BAD
const user = { email: 'test@example.com', name: 'John' };
// ✅ GOOD
import { createTestUser } from './factories/user';
const user = createTestUser({ role: 'admin' });
```
## Update Story File
Add test quality summary to story:
```markdown
## Test Quality Review
**Quality Score**: {score}/100 ({grade})
**Tests Reviewed**: {count}
### Issues Found
- {count} Critical: [list]
- {count} High: [list]
- {count} Medium: [list]
### Fixes Applied
- [Fix 1 description]
- [Fix 2 description]
```
## Completion Signals
### QUALITY APPROVED if:
- Quality score ≥ 70 (B or better)
- No critical issues remaining
- No high issues remaining (or all fixed)
Output: `TEST QUALITY APPROVED: {story_id} - Score: {score}/100`
or: `TEST QUALITY APPROVED WITH FIXES: {story_id} - Score: {score}/100, Fixed {n} issues`
### QUALITY CONCERNS if:
- Quality score 60-69 (C)
- Some medium issues but no blockers
Output: `TEST QUALITY CONCERNS: {story_id} - Score: {score}/100`
### QUALITY FAILED if:
- Quality score < 60 (F)
- Critical issues that cannot be fixed
- Systemic quality problems
Output the issues block:
```
TEST QUALITY ISSUES START
- [CRITICAL] Description (file:line)
- [HIGH] Description (file:line)
TEST QUALITY ISSUES END
```
Then: `TEST QUALITY FAILED: {story_id} - Score: {score}/100`
## Notes
- This step catches test quality issues BEFORE they accumulate across stories
- Flaky tests caught here are much cheaper to fix than after they cause CI failures
- The quality score is a guide, not an absolute - context matters
- When in doubt, prioritize determinism and isolation over other concerns
```
## Orchestration Integration
```bash
# Fresh context - focused only on test quality
claude -p "$(cat step-03b-test-quality.md | envsubst)"
```
## Integration with Fix Loop
If critical/high issues are found:
1. Issues are passed to a fix phase
2. Fix phase addresses quality issues
3. Re-run quality check
4. Max 2 attempts before proceeding with CONCERNS status
## Success Criteria
Phase complete when:
- Quality score ≥ 70 OR all critical/high issues fixed
- Changes are staged in git
- TEST QUALITY APPROVED signal output (or CONCERNS for borderline)

View File

@ -0,0 +1,352 @@
# Step 3c: Requirements Traceability & Coverage Gate (Per-Epic)
## Context Isolation
**IMPORTANT**: This step executes in a fresh Claude context after ALL stories are complete but before UAT generation. It validates that every acceptance criterion across the epic has appropriate test coverage.
## Objective
Generate a requirements-to-tests traceability matrix for the entire epic. Identify coverage gaps, and if gaps exist, trigger a self-healing loop to generate missing tests before proceeding to UAT.
## Inputs
- `epic_id`: The completed epic
- `epic_file`: Path to epic definition
- `completed_stories`: List of all story files in the epic
- `test_dir`: Project's test directory (auto-discovered)
## Integration with testarch-trace
This step applies the full `testarch-trace` workflow to generate:
- Requirements-to-tests traceability matrix
- Coverage analysis by priority (P0/P1/P2/P3)
- Gap identification with severity
- Quality gate decision (PASS/CONCERNS/FAIL)
## Coverage Thresholds
| Priority | Required Coverage | Gate Impact |
|----------|-------------------|-------------|
| **P0** (Critical) | 100% | FAIL if not met |
| **P1** (High) | ≥90% | CONCERNS if 80-89%, FAIL if <80% |
| **P2** (Medium) | ≥80% | Advisory only |
| **P3** (Low) | No requirement | Advisory only |
## Prompt Template
```
You are a Test Architect (TEA) executing requirements traceability analysis for a BMAD epic.
## Your Task
Generate a traceability matrix for Epic: {epic_id}
Map ALL acceptance criteria from ALL stories to their implementing tests.
Identify coverage gaps and determine if the epic is ready for UAT.
## Epic Definition
<epic>
{epic_file_contents}
</epic>
## Completed Stories
{for each story}
<story id="{story_id}">
{story_file_contents}
</story>
{end for}
## Phase 1: Discover and Catalog Tests
### 1.1 Find Test Files
```bash
# List all test files in the project
find . -type f \( -name "*.spec.ts" -o -name "*.test.ts" -o -name "*.spec.js" -o -name "*.test.js" \) | head -100
```
### 1.2 Extract Test Metadata
For each test file related to this epic:
- Test IDs (e.g., `{epic_id}-{story_seq}-E2E-001`)
- Describe blocks
- It blocks (individual test cases)
- Given-When-Then structure
- Priority markers (P0/P1/P2/P3)
## Phase 2: Map Criteria to Tests
### 2.1 For Each Acceptance Criterion
Search for explicit references:
- Test IDs mentioning the criterion
- Describe blocks referencing the requirement
- Given-When-Then narratives that match
### 2.2 Build Traceability Matrix
```markdown
## Traceability Matrix - Epic {epic_id}
### Coverage Summary
| Priority | Total Criteria | Covered | Coverage % | Status |
|----------|---------------|---------|------------|--------|
| P0 | {count} | {count} | {%} | ✅/❌ |
| P1 | {count} | {count} | {%} | ✅/⚠️/❌ |
| P2 | {count} | {count} | {%} | ✅/⚠️ |
| P3 | {count} | {count} | {%} | ✅ |
| **Total**| {count} | {count} | {%} | {status} |
### Detailed Mapping
#### Story {story_id}: {story_title}
| AC ID | Description | Priority | Test ID | Test File | Level | Status |
|-------|-------------|----------|---------|-----------|-------|--------|
| AC-1 | User can... | P0 | {id}-E2E-001 | tests/e2e/... | E2E | FULL |
| AC-2 | Error shows...| P1 | {id}-UNIT-001 | tests/unit/... | Unit | PARTIAL |
| AC-3 | Data persists | P1 | - | - | - | NONE |
```
### 2.3 Classify Coverage Status
For each criterion:
- **FULL**: All scenarios tested at appropriate level(s)
- **PARTIAL**: Some coverage but missing edge cases or levels
- **NONE**: No test coverage
- **UNIT-ONLY**: Only unit tests (missing integration/E2E)
- **INTEGRATION-ONLY**: Only integration tests (missing unit confidence)
## Phase 3: Gap Analysis
### 3.1 Identify Critical Gaps
```markdown
### Coverage Gaps
#### Critical Gaps (BLOCKING - P0 without coverage)
| Story | AC | Description | Recommended Test |
|-------|-----|-------------|------------------|
| {id} | AC-2 | [desc] | {id}-E2E-002: [Given-When-Then] |
#### High Priority Gaps (P1 coverage <90%)
| Story | AC | Description | Current | Missing |
|-------|-----|-------------|---------|---------|
| {id} | AC-5 | [desc] | UNIT-ONLY | E2E test for integration |
#### Medium Priority Gaps (Advisory)
| Story | AC | Description | Current | Recommendation |
|-------|-----|-------------|---------|----------------|
| {id} | AC-8 | [desc] | PARTIAL | Add edge case tests |
```
### 3.2 Gate Decision
Apply decision rules:
**PASS** if ALL:
- P0 coverage = 100%
- P1 coverage ≥ 90%
- Overall coverage ≥ 80%
- No critical gaps
**CONCERNS** if ANY:
- P1 coverage 80-89%
- P2 coverage <50%
- Minor gaps in edge case coverage
**FAIL** if ANY:
- P0 coverage < 100%
- P1 coverage < 80%
- Critical acceptance criteria without tests
## Phase 4: Self-Healing (If Gaps Found)
### If FAIL or CONCERNS with P0/P1 gaps:
Generate specific test recommendations:
```markdown
### Tests to Generate
For each gap, provide:
#### Gap 1: {story_id} AC-{n} - {description}
**Priority**: P0/P1
**Recommended Test ID**: {story_id}-E2E-{seq}
**Test Level**: E2E/Integration/Unit
**File Location**: tests/{level}/{feature}.spec.ts
**Test Specification**:
```gherkin
Feature: {feature name}
Scenario: {scenario name}
Given {precondition}
When {action}
Then {expected result}
```
**Implementation Guidance**:
- Setup: {what data/state to prepare}
- Action: {what to test}
- Assertions: {what to verify}
- Cleanup: {what to clean up}
```
### 4.1 Output for Fix Loop
If gaps need fixing, output:
```
TRACEABILITY GAPS START
GAP: {story_id}|AC-{n}|{priority}|{description}|{recommended_test_id}|{test_level}
SPEC:
Given: {precondition}
When: {action}
Then: {expected result}
GAP: {next gap...}
TRACEABILITY GAPS END
```
## Deliverables
### 1. Traceability Matrix Document
Save to: `docs/sprint-artifacts/traceability/epic-{epic_id}-traceability.md`
### 2. Gate Decision Summary
```markdown
## Quality Gate Decision
**Epic**: {epic_id}
**Decision**: PASS / CONCERNS / FAIL
**Date**: {date}
### Evidence Summary
| Metric | Threshold | Actual | Status |
|--------|-----------|--------|--------|
| P0 Coverage | 100% | {%} | ✅/❌ |
| P1 Coverage | ≥90% | {%} | ✅/⚠️/❌ |
| Overall Coverage | ≥80% | {%} | ✅/⚠️/❌ |
| Critical Gaps | 0 | {count} | ✅/❌ |
### Recommendation
{PASS: Proceed to UAT generation}
{CONCERNS: Proceed with noted gaps, create follow-up stories}
{FAIL: Generate missing tests before UAT}
```
### 3. Gate YAML Snippet
```yaml
traceability:
epic_id: "{epic_id}"
coverage:
overall: {%}
p0: {%}
p1: {%}
p2: {%}
gaps:
critical: {count}
high: {count}
medium: {count}
status: "PASS|CONCERNS|FAIL"
timestamp: "{timestamp}"
```
## Completion Signals
### TRACEABILITY PASS if:
- P0 coverage = 100%
- P1 coverage ≥ 90%
- No critical gaps
Output: `TRACEABILITY PASS: {epic_id} - P0: 100%, P1: {p1}%, Overall: {overall}%`
### TRACEABILITY CONCERNS if:
- P0 coverage = 100%
- P1 coverage 80-89%
Output: `TRACEABILITY CONCERNS: {epic_id} - P1 at {p1}% (below 90%)`
### TRACEABILITY FAIL if:
- P0 coverage < 100%
- P1 coverage < 80%
First output gaps block (for self-healing):
```
TRACEABILITY GAPS START
GAP: ...
TRACEABILITY GAPS END
```
Then: `TRACEABILITY FAIL: {epic_id} - P0: {p0}%, P1: {p1}%, {n} critical gaps`
```
## Self-Healing Fix Loop
When TRACEABILITY FAIL is signaled with gaps:
1. **Gap Extraction**: Shell script extracts gaps from output
2. **Test Generation Phase**: New Claude context generates missing tests
3. **Re-run Traceability**: Verify gaps are closed
4. **Max Attempts**: 3 attempts before proceeding with CONCERNS and follow-up stories
### Test Generation Prompt (for fix loop)
```
You are a Test Architect generating tests to close coverage gaps.
## Gaps to Address
{gaps_from_traceability}
## Instructions
For each gap:
1. Create the test file if it doesn't exist
2. Implement the test following the Given-When-Then specification
3. Use existing test patterns from the codebase
4. Run the test to verify it passes
5. Stage changes: git add -A
## Completion
Output: TEST GENERATION COMPLETE: Generated {n} tests
Or: TEST GENERATION PARTIAL: Generated {n} of {m} tests - {reason for gaps}
```
## Notes
- This step runs ONCE per epic, not per story
- It catches acceptance criteria that slipped through without tests
- Self-healing generates tests automatically rather than just reporting gaps
- The traceability matrix becomes documentation for UAT and compliance
- Follow-up stories are created for gaps that can't be auto-generated
```
## Orchestration Integration
```bash
# Fresh context - comprehensive traceability analysis
claude -p "$(cat step-03c-traceability.md | envsubst)"
```
## Success Criteria
Phase complete when:
- Traceability matrix generated
- Gate decision made (PASS/CONCERNS/FAIL)
- If FAIL: Self-healing loop attempted (max 3 times)
- TRACEABILITY PASS or CONCERNS signal output
- Ready for UAT generation

View File

@ -4,7 +4,7 @@
| Field | Value |
|-------|-------|
| Version | 1.0.0 |
| Version | 2.0.0 |
| Trigger | `epic-execute` |
| Agent | SM (Scrum Master) |
| Category | Implementation |
@ -23,36 +23,54 @@ Automatically execute all stories in an epic sequentially with context isolation
## Workflow Phases
This workflow orchestrates multiple isolated agent sessions:
This workflow orchestrates multiple isolated agent sessions with comprehensive quality gates:
```
┌─────────────────────────────────────────────────────────────┐
│ EPIC EXECUTE FLOW │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Phase 1 │ │ Phase 2 │ │ Phase 3 │ │
│ │ Dev │───►│ Review │───►│ Commit │ │
│ │ (Context A)│ │ (Context B) │ │ (Shell) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │
│ └──────────── Per Story Loop ─────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Phase 4 │ │
│ │ UAT Generation (Context C) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ ENHANCED EPIC EXECUTE FLOW (v2.0) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Dev │→ │ Arch │→ │ Code │→ │ Test Quality │ │
│ │ (impl) │ │ Compliance │ │ Review │ │ Review │ │
│ └──────────┘ └──────────────┘ └──────────┘ └──────────────┘ │
│ │ │ │ │ │
│ └──────────────┴────────────────┴───────────────┘ │
│ │ │
│ ─── Per Story Loop (with fix loops) ─── │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Traceability Check │ │
│ │ (Per-Epic, with self-healing) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ UAT Generation │ │
│ │ (Fresh Context) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
## Steps
### Per-Story Steps
| Step | File | Description |
|------|------|-------------|
| 1 | step-01-init.md | Discover epic and validate stories |
| 2 | step-02-dev-story.md | Development phase prompt (isolated context) |
| 3 | step-03-code-review.md | Review phase prompt (isolated context) |
| 2 | step-02-dev-story.md | Development phase (isolated context) |
| 2b | step-02b-arch-compliance.md | Architecture compliance check |
| 3 | step-03-code-review.md | Code review phase (isolated context) |
| 3b | step-03b-test-quality.md | Test quality review |
### Per-Epic Steps
| Step | File | Description |
|------|------|-------------|
| 3c | step-03c-traceability.md | Requirements traceability with self-healing |
| 4 | step-04-generate-uat.md | UAT document generation (isolated context) |
| 5 | step-05-summary.md | Final execution summary |
@ -60,13 +78,26 @@ This workflow orchestrates multiple isolated agent sessions:
| Output | Location | Description |
|--------|----------|-------------|
| Updated Stories | `docs/stories/` | Stories marked Done with Dev Agent Records and Code Review Records |
| Updated Stories | `docs/stories/` | Stories with Dev Agent Records, Code Review Records, Test Quality summaries |
| Traceability Matrix | `docs/sprint-artifacts/traceability/epic-{id}-traceability.md` | Requirements-to-tests mapping |
| UAT Document | `docs/uat/epic-{id}-uat.md` | Human testing script |
| Execution Metrics | `docs/sprint-artifacts/metrics/epic-{id}-metrics.yaml` | Run metrics including fix loop data |
| Execution Log | `docs/sprints/epic-{id}-execution.md` | Run summary |
## Issue Fix Policy
## Quality Gates
During code review, issues are categorized by severity and fixed based on thresholds:
### Architecture Compliance (Per-Story)
Validates implementation against `architecture.md`:
| Category | What It Catches | Severity |
|----------|-----------------|----------|
| Layer violations | Business logic in UI, DB calls from controllers | HIGH |
| Dependency direction | Circular deps, wrong import directions | HIGH |
| Pattern conformance | Deviating from established patterns | MEDIUM |
| Module boundaries | Features leaking across modules | MEDIUM |
### Code Review Issue Fix Policy
| Severity | Criteria | Action |
|----------|----------|--------|
@ -74,7 +105,32 @@ During code review, issues are categorized by severity and fixed based on thresh
| **MEDIUM** | Pattern violations, missing edge cases, hardcoded config | Fix if total issues > 5 |
| **LOW** | Naming, style, missing comments | Document only |
This ensures critical issues are always resolved while avoiding over-engineering on minor items.
### Test Quality Review (Per-Story)
Validates tests against testarch best practices:
| Criterion | What It Catches |
|-----------|-----------------|
| Hard waits | Flaky `sleep()`, `waitForTimeout()` calls |
| Missing assertions | Tests that pass without checking anything |
| Shared state | Tests that depend on execution order |
| Hardcoded data | Magic strings instead of factories |
| Network races | Route interception after navigation |
Quality score 0-100 with grade. Issues fixed automatically when critical/high.
### Requirements Traceability (Per-Epic)
Maps acceptance criteria to tests with coverage thresholds:
| Priority | Required Coverage | Gate Impact |
|----------|-------------------|-------------|
| P0 (Critical) | 100% | FAIL if not met |
| P1 (High) | ≥90% | CONCERNS if 80-89% |
| P2 (Medium) | ≥80% | Advisory |
| P3 (Low) | None | Advisory |
Self-healing: Automatically generates missing tests (up to 3 attempts).
## Orchestration Script
@ -89,6 +145,11 @@ See: `scripts/epic-execute.sh`
# Example
./bmad/scripts/epic-execute.sh 1
# Skip optional quality gates (not recommended)
./bmad/scripts/epic-execute.sh 1 --skip-arch
./bmad/scripts/epic-execute.sh 1 --skip-test-quality
./bmad/scripts/epic-execute.sh 1 --skip-traceability
```
Or invoke steps manually:
@ -126,8 +187,10 @@ review_mode: standard
| Scenario | Behavior |
|----------|----------|
| Dev fails to complete | Log failure, skip to next story, mark blocked |
| Review finds critical issues | Attempt fix, re-review once, then flag for human |
| Tests fail | Attempt fix, re-run, fail after 3 attempts |
| Arch violations found | Attempt fix (2 max), proceed with documented violations |
| Review finds critical issues | Attempt fix (3 max), re-review, then fail story |
| Test quality issues | Attempt fix (2 max), proceed with CONCERNS status |
| Traceability gaps | Generate missing tests (3 max), proceed with gaps documented |
| Story dependency not met | Skip story, continue, report in summary |
## Notes
@ -136,3 +199,6 @@ review_mode: standard
- Git staging passes code between contexts (not context window)
- Story files pass notes between contexts (Dev Agent Record section)
- Human intervention only required at UAT testing phase
- Quality gates are non-blocking by default (issues documented, not fatal)
- Self-healing loops automatically fix issues when possible
- Traceability matrix provides audit trail for compliance requirements