feat(epic-execute): add JSON output parsing and TDD workflow phases

Implements the final two improvements from bmad_improvements_v2.md:

## Structured JSON Output (Improvement #6)
- New module: scripts/epic-execute-lib/json-output.sh
- Functions for extracting and parsing JSON from Claude output
- Unified check_phase_completion() with JSON + text fallback
- Updated prompts to request JSON result blocks
- Added --legacy-output flag to disable JSON parsing

## Test-First Flow (Improvement #7)
- New module: scripts/epic-execute-lib/tdd-flow.sh
- execute_test_spec_phase() - Generates BDD specs from acceptance criteria
- execute_test_impl_phase() - Creates failing tests from specs
- execute_test_verification_phase() - Verifies tests fail correctly
- Integration with dev phase for TDD context
- Added --skip-tdd, --skip-test-spec, --skip-test-impl flags

All 7 improvements from the analysis are now complete.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Caleb 2026-01-26 14:33:25 -06:00
parent 6a5b7b68e8
commit 5c63f31c0e
4 changed files with 1199 additions and 62 deletions

View File

@ -405,11 +405,11 @@ Respect these decisions unless you have a specific reason to deviate.
---
### 6. Test-First Enforcement (HIGH IMPACT, HIGH EFFORT)
### 6. Test-First Enforcement (HIGH IMPACT, HIGH EFFORT) ✅ IMPLEMENTED
Restructure the flow to enforce TDD principles.
**Proposed New Flow:**
**Implemented Flow:**
```
1. DESIGN PHASE (Dev) → Plan implementation approach
@ -424,14 +424,29 @@ Restructure the flow to enforce TDD principles.
This ensures tests actually test requirements rather than implementation details.
**Implementation:** Module at `scripts/epic-execute-lib/tdd-flow.sh` with functions:
- `execute_test_spec_phase()` - Generates BDD test specifications from acceptance criteria
- `execute_test_impl_phase()` - Creates failing tests from specifications
- `execute_test_verification_phase()` - Verifies tests fail correctly before implementation
- `build_test_spec_context_for_dev()` - Provides test context to dev phase
**Skip flags:** `--skip-tdd`, `--skip-test-spec`, `--skip-test-impl`
---
### 7. Structured Output Validation (MEDIUM IMPACT, MEDIUM EFFORT)
### 7. Structured Output Validation (MEDIUM IMPACT, MEDIUM EFFORT) ✅ IMPLEMENTED
Replace fragile regex parsing with structured JSON output.
**Implementation:** Module at `scripts/epic-execute-lib/json-output.sh` with functions:
- `extract_json_result()` - Parse JSON from Claude output
- `get_result_status()` - Extract status field
- `get_result_files()` - Extract files_changed array
- `get_result_issues()` - Extract issues for fix loops
- `check_phase_completion()` - Unified completion detection with JSON + text fallback
```bash
# Add to prompts:
# Prompts now request JSON output:
"Output your result as JSON:
\`\`\`json
{
@ -445,11 +460,13 @@ Replace fragile regex parsing with structured JSON output.
}
\`\`\`"
# Parse with jq:
result_json=$(echo "$result" | sed -n '/```json/,/```/p' | sed '1d;$d')
status=$(echo "$result_json" | jq -r '.status')
# Parsing with fallback:
check_phase_completion "$result" "dev" "$story_id"
# Returns: 0 (complete), 1 (failed), 2 (unclear)
```
**Skip flag:** `--legacy-output` (disables JSON parsing, uses text-only detection)
---
## Implementation Priority Matrix
@ -461,8 +478,8 @@ status=$(echo "$result_json" | jq -r '.status')
| 3 | Decision Log | MEDIUM | LOW | ✅ DONE | Easy context preservation |
| 4 | Regression Gate | HIGH | MEDIUM | ✅ DONE | Prevents silent breakage |
| 5 | Design Phase | HIGH | MEDIUM | ✅ DONE | Catches issues early |
| 6 | Structured JSON Output | MEDIUM | MEDIUM | | Improves reliability |
| 7 | Test-First Flow | HIGH | HIGH | | Fundamental quality improvement |
| 6 | Structured JSON Output | MEDIUM | MEDIUM | ✅ DONE | Improves reliability |
| 7 | Test-First Flow | HIGH | HIGH | ✅ DONE | Fundamental quality improvement |
---
@ -532,8 +549,16 @@ These three changes alone would dramatically improve code reliability with minim
**Additionally implemented:**
4. **Regression Gate** - Prevents silent breakage ✅
5. **Design Phase** - Catches architectural issues early ✅
6. **Structured JSON Output** - Reliable completion signal parsing ✅
7. **Test-First Flow** - TDD workflow with test specs before implementation ✅
**Implementation:** All features are modularized in `scripts/epic-execute-lib/` with graceful degradation and skip flags (`--skip-design`, `--skip-regression`).
**Implementation:** All features are modularized in `scripts/epic-execute-lib/` with graceful degradation and skip flags:
- `--skip-design` - Skip pre-implementation design phase
- `--skip-regression` - Skip regression test gate
- `--skip-tdd` - Skip test-first development phases
- `--skip-test-spec` - Skip test specification phase only
- `--skip-test-impl` - Skip test implementation phase only
- `--legacy-output` - Use legacy text-based output parsing (no JSON)
---

View File

@ -0,0 +1,433 @@
#!/bin/bash
#
# BMAD Epic Execute - JSON Output Module
#
# Provides functions for structured JSON output parsing to replace
# fragile regex-based completion signal detection.
#
# Usage: Sourced by epic-execute.sh
#
# =============================================================================
# JSON Output Variables
# =============================================================================
# Whether to use legacy text-based parsing instead of JSON
USE_LEGACY_OUTPUT=false
# Last extracted JSON result (for reuse within a phase)
LAST_JSON_RESULT=""
# =============================================================================
# JSON Output Functions
# =============================================================================
# Extract JSON result block from Claude output
# Looks for ```json ... ``` or ```result ... ``` blocks
# Arguments:
# $1 - Full Claude output
# Returns: JSON string or empty if not found
extract_json_result() {
local output="$1"
# Reset last result
LAST_JSON_RESULT=""
# Try to extract JSON from code block (```json ... ```)
local json_block
json_block=$(echo "$output" | sed -n '/```json/,/```/p' | sed '1d;$d')
# If no ```json block, try ```result block
if [ -z "$json_block" ]; then
json_block=$(echo "$output" | sed -n '/```result/,/```/p' | sed '1d;$d')
fi
# If still no block, try to find raw JSON object at end of output
if [ -z "$json_block" ]; then
# Look for JSON object pattern {"status": ...}
json_block=$(echo "$output" | grep -oE '\{"status":[^}]+\}' | tail -1)
fi
# Validate JSON if jq is available
if [ -n "$json_block" ] && command -v jq >/dev/null 2>&1; then
if echo "$json_block" | jq . >/dev/null 2>&1; then
LAST_JSON_RESULT="$json_block"
echo "$json_block"
return 0
fi
elif [ -n "$json_block" ]; then
# jq not available, return raw block
LAST_JSON_RESULT="$json_block"
echo "$json_block"
return 0
fi
echo ""
return 1
}
# Get the status field from a JSON result
# Arguments:
# $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
# Returns: Status string (COMPLETE, BLOCKED, FAILED, PASSED, etc.)
get_result_status() {
local json="${1:-$LAST_JSON_RESULT}"
if [ -z "$json" ]; then
echo ""
return 1
fi
if command -v jq >/dev/null 2>&1; then
echo "$json" | jq -r '.status // empty'
else
# Fallback: basic pattern matching
echo "$json" | grep -oE '"status":\s*"[^"]+"' | sed 's/.*"\([^"]*\)"$/\1/'
fi
}
# Get the story_id field from a JSON result
# Arguments:
# $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
get_result_story_id() {
local json="${1:-$LAST_JSON_RESULT}"
if [ -z "$json" ]; then
echo ""
return 1
fi
if command -v jq >/dev/null 2>&1; then
echo "$json" | jq -r '.story_id // empty'
else
echo "$json" | grep -oE '"story_id":\s*"[^"]+"' | sed 's/.*"\([^"]*\)"$/\1/'
fi
}
# Get the summary field from a JSON result
# Arguments:
# $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
get_result_summary() {
local json="${1:-$LAST_JSON_RESULT}"
if [ -z "$json" ]; then
echo ""
return 1
fi
if command -v jq >/dev/null 2>&1; then
echo "$json" | jq -r '.summary // empty'
else
echo "$json" | grep -oE '"summary":\s*"[^"]+"' | sed 's/.*"\([^"]*\)"$/\1/'
fi
}
# Get the files_changed array from a JSON result
# Arguments:
# $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
# Returns: Newline-separated list of file paths
get_result_files() {
local json="${1:-$LAST_JSON_RESULT}"
if [ -z "$json" ]; then
echo ""
return 1
fi
if command -v jq >/dev/null 2>&1; then
echo "$json" | jq -r '.files_changed[]? // empty'
else
# Fallback: basic pattern matching (limited)
echo "$json" | grep -oE '"files_changed":\s*\[[^\]]*\]' | grep -oE '"[^"]+\.[a-z]+"' | tr -d '"'
fi
}
# Get the concerns array from a JSON result
# Arguments:
# $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
# Returns: Newline-separated list of concerns
get_result_concerns() {
local json="${1:-$LAST_JSON_RESULT}"
if [ -z "$json" ]; then
echo ""
return 1
fi
if command -v jq >/dev/null 2>&1; then
echo "$json" | jq -r '.concerns[]? // empty'
else
echo ""
fi
}
# Get the issues array from a JSON result (for review/fix phases)
# Arguments:
# $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
# Returns: JSON array of issues or empty
get_result_issues() {
local json="${1:-$LAST_JSON_RESULT}"
if [ -z "$json" ]; then
echo ""
return 1
fi
if command -v jq >/dev/null 2>&1; then
echo "$json" | jq -c '.issues // []'
else
echo "[]"
fi
}
# Get the tests_added count from a JSON result
# Arguments:
# $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
# Returns: Number of tests added
get_result_tests_added() {
local json="${1:-$LAST_JSON_RESULT}"
if [ -z "$json" ]; then
echo "0"
return 1
fi
if command -v jq >/dev/null 2>&1; then
echo "$json" | jq -r '.tests_added // 0'
else
echo "$json" | grep -oE '"tests_added":\s*[0-9]+' | grep -oE '[0-9]+' || echo "0"
fi
}
# Get the decisions array from a JSON result
# Arguments:
# $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
# Returns: JSON array of decisions
get_result_decisions() {
local json="${1:-$LAST_JSON_RESULT}"
if [ -z "$json" ]; then
echo "[]"
return 1
fi
if command -v jq >/dev/null 2>&1; then
echo "$json" | jq -c '.decisions // []'
else
echo "[]"
fi
}
# Check phase completion with JSON parsing and text fallback
# Arguments:
# $1 - Full Claude output
# $2 - Phase type (dev, review, fix, arch, test_quality, trace, uat)
# $3 - Story ID (for legacy text matching)
# Returns: 0 if complete/passed, 1 if failed/blocked, 2 if unclear
check_phase_completion() {
local output="$1"
local phase_type="$2"
local story_id="$3"
# Try JSON parsing first (unless legacy mode)
if [ "$USE_LEGACY_OUTPUT" != true ]; then
local json_result
json_result=$(extract_json_result "$output")
if [ -n "$json_result" ]; then
local status
status=$(get_result_status "$json_result")
case "$status" in
COMPLETE|PASSED|COMPLIANT|APPROVED)
return 0
;;
BLOCKED|FAILED|VIOLATIONS|CONCERNS)
return 1
;;
esac
fi
fi
# Fallback to legacy text-based parsing
case "$phase_type" in
dev)
if echo "$output" | grep -q "IMPLEMENTATION COMPLETE"; then
return 0
elif echo "$output" | grep -q "IMPLEMENTATION BLOCKED"; then
return 1
fi
;;
review)
if echo "$output" | grep -q "REVIEW PASSED"; then
return 0
elif echo "$output" | grep -q "REVIEW FAILED"; then
return 1
fi
;;
fix)
if echo "$output" | grep -q "FIX COMPLETE"; then
return 0
elif echo "$output" | grep -q "FIX INCOMPLETE"; then
return 1
fi
;;
arch)
if echo "$output" | grep -q "ARCH COMPLIANT"; then
return 0
elif echo "$output" | grep -q "ARCH VIOLATIONS"; then
return 1
fi
;;
test_quality)
if echo "$output" | grep -q "TEST QUALITY APPROVED"; then
return 0
elif echo "$output" | grep -q "TEST QUALITY FAILED"; then
return 1
elif echo "$output" | grep -q "TEST QUALITY CONCERNS"; then
# Concerns don't block
return 0
fi
;;
trace)
if echo "$output" | grep -q "TRACEABILITY PASS"; then
return 0
elif echo "$output" | grep -q "TRACEABILITY FAIL"; then
return 1
elif echo "$output" | grep -q "TRACEABILITY CONCERNS"; then
return 0
fi
;;
uat)
if echo "$output" | grep -q "UAT GENERATED"; then
return 0
fi
;;
test_gen)
if echo "$output" | grep -q "TEST GENERATION COMPLETE"; then
return 0
elif echo "$output" | grep -q "TEST GENERATION PARTIAL"; then
return 1
fi
;;
esac
# Unclear result
return 2
}
# Build JSON output instruction block for prompts
# Arguments:
# $1 - Phase type (dev, review, fix, arch, test_quality, trace, uat)
# $2 - Story ID
# Returns: Instruction text for prompts
build_json_output_instructions() {
local phase_type="$1"
local story_id="$2"
cat << 'EOF'
## Output Format
After completing your task, output a JSON result block:
```json
{
"status": "COMPLETE" | "BLOCKED" | "FAILED" | "PASSED" | "VIOLATIONS" | "CONCERNS",
"story_id": "<story id>",
"summary": "<brief description of what was done>",
"files_changed": ["<path1>", "<path2>"],
"tests_added": <number>,
"decisions": [
{"what": "<decision made>", "why": "<reasoning>"}
],
"issues": [
{"severity": "HIGH|MEDIUM|LOW", "description": "<issue>", "location": "<file:line>"}
],
"concerns": ["<any concerns or warnings>"]
}
```
### Status Values by Phase
EOF
case "$phase_type" in
dev)
cat << EOF
- **COMPLETE**: Implementation finished successfully
- **BLOCKED**: Cannot proceed due to missing dependencies or unclear requirements
Then ALSO output the legacy signal for backward compatibility:
- Success: \`IMPLEMENTATION COMPLETE: $story_id\`
- Blocked: \`IMPLEMENTATION BLOCKED: $story_id - [reason]\`
EOF
;;
review)
cat << EOF
- **PASSED**: Code review passed (all issues fixed or acceptable)
- **FAILED**: Critical issues remain that need developer attention
Then ALSO output the legacy signal for backward compatibility:
- Pass: \`REVIEW PASSED: $story_id\`
- Fail: \`REVIEW FAILED: $story_id - [reason]\`
EOF
;;
fix)
cat << EOF
- **COMPLETE**: All issues from review have been fixed
- **FAILED**: Unable to fix one or more issues
Then ALSO output the legacy signal for backward compatibility:
- Complete: \`FIX COMPLETE: $story_id - Fixed N issues\`
- Incomplete: \`FIX INCOMPLETE: $story_id - [reason]\`
EOF
;;
arch)
cat << EOF
- **COMPLIANT**: No architecture violations (or all fixed)
- **VIOLATIONS**: Architecture violations that need attention
Then ALSO output the legacy signal for backward compatibility:
- Compliant: \`ARCH COMPLIANT: $story_id\`
- Violations: \`ARCH VIOLATIONS: $story_id - [summary]\`
EOF
;;
test_quality)
cat << EOF
- **APPROVED**: Test quality meets standards (score >= 70)
- **CONCERNS**: Minor quality issues (score 60-69)
- **FAILED**: Test quality below acceptable threshold (score < 60)
Then ALSO output the legacy signal for backward compatibility:
- Approved: \`TEST QUALITY APPROVED: $story_id - Score: N/100\`
- Concerns: \`TEST QUALITY CONCERNS: $story_id - Score: N/100\`
- Failed: \`TEST QUALITY FAILED: $story_id - Score: N/100\`
EOF
;;
trace)
cat << EOF
- **PASSED**: Traceability requirements met (P0=100%, P1>=90%)
- **CONCERNS**: Minor gaps (P1 80-89%)
- **FAILED**: Critical traceability gaps
Then ALSO output the legacy signal for backward compatibility:
- Pass: \`TRACEABILITY PASS: Epic-$story_id - P0: N%, P1: M%\`
- Fail: \`TRACEABILITY FAIL: Epic-$story_id - X critical gaps\`
EOF
;;
uat)
cat << EOF
- **COMPLETE**: UAT document generated successfully
Then ALSO output the legacy signal: \`UAT GENERATED: <path>\`
EOF
;;
esac
}

View File

@ -0,0 +1,460 @@
#!/bin/bash
#
# BMAD Epic Execute - TDD Flow Module
#
# Provides test-first development (TDD) workflow phases:
# 1. Test Specification - Generate test specs from acceptance criteria
# 2. Test Implementation - Create failing tests from specs
# 3. Test Verification - Verify tests fail appropriately before implementation
#
# Usage: Sourced by epic-execute.sh
#
# =============================================================================
# TDD Flow Variables
# =============================================================================
# Store test specifications for use across phases
LAST_TEST_SPEC=""
# Test spec output directory
TEST_SPEC_DIR=""
# =============================================================================
# Test Specification Phase
# =============================================================================
# Execute test specification phase
# Generates BDD-style test specifications from acceptance criteria
# Arguments:
# $1 - story_file path
execute_test_spec_phase() {
local story_file="$1"
local story_id=$(basename "$story_file" .md)
# Reset last spec
LAST_TEST_SPEC=""
log ">>> TEST SPEC PHASE: $story_id (generating test specifications)"
local story_contents=$(cat "$story_file")
# Load architecture file if available for context
local arch_contents=""
for search_path in "$PROJECT_ROOT/docs/architecture.md" "$PROJECT_ROOT/docs/architecture/architecture.md" "$PROJECT_ROOT/architecture.md"; do
if [ -f "$search_path" ]; then
arch_contents=$(cat "$search_path")
break
fi
done
# Get design context if available
local design_context=""
if type get_last_design >/dev/null 2>&1; then
design_context=$(get_last_design)
fi
local spec_prompt="You are a Test Architect (TEA) generating test specifications from acceptance criteria.
## Your Task
Generate test specifications for: $story_id
Do NOT write test code yet. Output only test specifications in BDD format.
### CRITICAL RULES
- One test specification per acceptance criterion minimum
- Use Given-When-Then format for all specifications
- Include edge cases and error scenarios
- Assign unique test IDs (format: ${story_id}-E2E-001, ${story_id}-UNIT-001)
- Map each AC explicitly to test specifications
## Story to Analyze
**Story Path:** $story_file
**Story ID:** $story_id
<story>
$story_contents
</story>
## Architecture Context (for understanding test boundaries)
<architecture>
$arch_contents
</architecture>
## Design Context (if available)
<design>
$design_context
</design>
## Exploration Commands
First, explore existing test patterns in the codebase:
\`\`\`bash
# Find existing test files
find . -type f \\( -name \"*.spec.ts\" -o -name \"*.test.ts\" -o -name \"*.spec.js\" -o -name \"*.test.js\" \\) | head -10
# Check test directory structure
ls -la test/ tests/ __tests__/ src/**/__tests__/ 2>/dev/null || true
\`\`\`
## Required Output
Output your test specifications in this exact format:
\`\`\`
TEST SPEC START
story_id: $story_id
generated: $(date '+%Y-%m-%d')
test_specifications:
## AC1: <acceptance criterion text>
### ${story_id}-E2E-001: <descriptive test name>
- Priority: P0|P1|P2
- Type: e2e|integration|unit
- Given: <precondition state>
- When: <action performed>
- Then: <expected outcome>
- Data: <test data requirements>
### ${story_id}-E2E-002: <edge case for AC1>
- Priority: P1
- Type: e2e
- Given: <edge case precondition>
- When: <action>
- Then: <expected error/behavior>
## AC2: <next acceptance criterion>
### ${story_id}-UNIT-001: <unit test name>
...
edge_cases:
- <scenario not in ACs but important>
error_scenarios:
- <error condition to test>
test_file_mapping:
- ${story_id}-E2E-*: <suggested test file path>
- ${story_id}-UNIT-*: <suggested test file path>
- ${story_id}-INT-*: <suggested test file path>
TEST SPEC END
\`\`\`
## Completion Signal
After outputting the spec block:
1. Output JSON result:
\`\`\`json
{
\"status\": \"COMPLETE\",
\"story_id\": \"$story_id\",
\"summary\": \"Generated N test specifications for M acceptance criteria\",
\"tests_added\": <number of specs>
}
\`\`\`
2. Then output: TEST SPEC COMPLETE: $story_id - Generated N specifications"
if [ "$DRY_RUN" = true ]; then
echo "[DRY RUN] Would execute test spec phase for $story_id"
return 0
fi
local result
result=$(claude --dangerously-skip-permissions -p "$spec_prompt" 2>&1) || true
echo "$result" >> "$LOG_FILE"
# Extract test spec block
LAST_TEST_SPEC=$(echo "$result" | sed -n '/TEST SPEC START/,/TEST SPEC END/p')
if [ -n "$LAST_TEST_SPEC" ]; then
# Save to spec directory
TEST_SPEC_DIR="$SPRINT_ARTIFACTS_DIR/test-specs"
mkdir -p "$TEST_SPEC_DIR"
echo "$LAST_TEST_SPEC" > "$TEST_SPEC_DIR/${story_id}-test-spec.md"
# Save to decision log
if type append_to_decision_log >/dev/null 2>&1; then
append_to_decision_log "TEST_SPEC" "$story_id" "$LAST_TEST_SPEC"
fi
log_success "Test spec phase complete: $story_id"
log "Saved to: $TEST_SPEC_DIR/${story_id}-test-spec.md"
return 0
else
log_error "Test spec phase did not produce valid output"
return 1
fi
}
# =============================================================================
# Test Implementation Phase
# =============================================================================
# Execute test implementation phase
# Creates failing tests from specifications
# Arguments:
# $1 - story_file path
execute_test_impl_phase() {
local story_file="$1"
local story_id=$(basename "$story_file" .md)
log ">>> TEST IMPL PHASE: $story_id (implementing failing tests)"
# Check if we have test specs
if [ -z "$LAST_TEST_SPEC" ]; then
# Try to load from file
if [ -f "$TEST_SPEC_DIR/${story_id}-test-spec.md" ]; then
LAST_TEST_SPEC=$(cat "$TEST_SPEC_DIR/${story_id}-test-spec.md")
else
log_error "No test specifications available for $story_id"
return 1
fi
fi
local story_contents=$(cat "$story_file")
local impl_prompt="You are a Test Architect (TEA) implementing tests from specifications.
## Your Task
Implement failing tests for: $story_id
The tests MUST FAIL initially because the feature is not yet implemented.
This is Test-First Development (TDD).
### CRITICAL RULES
- Create test files based on the specifications below
- Tests should compile/parse without errors
- Tests should FAIL when run (feature not implemented yet)
- Follow existing test patterns in the codebase
- Use proper fixtures and data factories
- Do NOT implement any feature code
## Test Specifications to Implement
<test-spec>
$LAST_TEST_SPEC
</test-spec>
## Story Context
<story>
$story_contents
</story>
## Exploration Commands
First, examine existing test patterns:
\`\`\`bash
# Find existing test patterns
find . -type f \\( -name \"*.spec.ts\" -o -name \"*.test.ts\" \\) -exec head -50 {} \\; 2>/dev/null | head -100
# Check for test utilities
ls -la test/utils/ tests/helpers/ __tests__/fixtures/ 2>/dev/null || true
\`\`\`
## Implementation Guidelines
1. **File Structure**: Create test files in the appropriate directory
2. **Imports**: Use the project's test framework (jest, mocha, vitest, etc.)
3. **Describe Blocks**: Group tests by acceptance criterion
4. **Test Names**: Include test IDs from specifications
5. **BDD Format**: Use Given-When-Then comments
6. **Assertions**: Write assertions that will FAIL until feature is implemented
7. **Data**: Use factories or fixtures, no hardcoded values
## Example Test Structure
\`\`\`typescript
describe('Feature: <story description>', () => {
describe('AC1: <acceptance criterion>', () => {
test('${story_id}-E2E-001: should <expected behavior>', async () => {
// Given: <precondition>
const setup = await createTestFixture();
// When: <action>
const result = await performAction(setup);
// Then: <expected outcome>
expect(result).toBe(expectedValue); // Will FAIL - not implemented
});
});
});
\`\`\`
## Completion Signal
After implementing the tests:
1. Stage the test files: git add -A
2. Output JSON result:
\`\`\`json
{
\"status\": \"COMPLETE\",
\"story_id\": \"$story_id\",
\"summary\": \"Implemented N failing tests in M files\",
\"files_changed\": [\"<test file paths>\"],
\"tests_added\": <number>
}
\`\`\`
3. Then output: TEST IMPL COMPLETE: $story_id - Implemented N tests"
if [ "$DRY_RUN" = true ]; then
echo "[DRY RUN] Would execute test impl phase for $story_id"
return 0
fi
local result
result=$(claude --dangerously-skip-permissions -p "$impl_prompt" 2>&1) || true
echo "$result" >> "$LOG_FILE"
# Check completion
local completion_status
if type check_phase_completion >/dev/null 2>&1; then
check_phase_completion "$result" "test_gen" "$story_id"
completion_status=$?
else
if echo "$result" | grep -q "TEST IMPL COMPLETE"; then
completion_status=0
else
completion_status=2
fi
fi
case $completion_status in
0)
log_success "Test impl phase complete: $story_id"
return 0
;;
*)
log_error "Test impl phase did not complete cleanly: $story_id"
return 1
;;
esac
}
# =============================================================================
# Test Verification Phase
# =============================================================================
# Execute test verification phase
# Verifies that tests fail appropriately (compile but don't pass)
# Arguments:
# $1 - story_file path
execute_test_verification_phase() {
local story_file="$1"
local story_id=$(basename "$story_file" .md)
log ">>> TEST VERIFICATION PHASE: $story_id (verifying tests fail correctly)"
if [ "$DRY_RUN" = true ]; then
echo "[DRY RUN] Would verify tests fail for $story_id"
return 0
fi
# Run tests and expect failures
local test_output=""
local test_exit_code=0
if [ -f "$PROJECT_ROOT/package.json" ]; then
# Node.js project
test_output=$(cd "$PROJECT_ROOT" && npm test 2>&1) || test_exit_code=$?
elif [ -f "$PROJECT_ROOT/Cargo.toml" ]; then
test_output=$(cd "$PROJECT_ROOT" && cargo test 2>&1) || test_exit_code=$?
elif [ -f "$PROJECT_ROOT/go.mod" ]; then
test_output=$(cd "$PROJECT_ROOT" && go test ./... 2>&1) || test_exit_code=$?
elif [ -f "$PROJECT_ROOT/requirements.txt" ] || [ -f "$PROJECT_ROOT/pyproject.toml" ]; then
if command -v pytest >/dev/null 2>&1; then
test_output=$(cd "$PROJECT_ROOT" && pytest 2>&1) || test_exit_code=$?
fi
fi
echo "$test_output" >> "$LOG_FILE"
# Analyze test output
# We expect: tests compile, tests run, tests FAIL (exit code non-zero)
# Check for compilation errors (bad - tests should at least compile)
if echo "$test_output" | grep -qiE "syntax error|cannot find module|compilation failed|parse error"; then
log_error "Tests have compilation/syntax errors - fix before proceeding"
echo "$test_output" | grep -iE "syntax error|cannot find module|compilation failed|parse error" | head -10
return 1
fi
# Check for test failures (good - expected in TDD)
if [ $test_exit_code -ne 0 ]; then
# Count failures
local failure_count=0
failure_count=$(echo "$test_output" | grep -cE "FAIL|failed|failing" || echo "0")
if [ "$failure_count" -gt 0 ]; then
log_success "Test verification passed: $failure_count test(s) failing as expected"
log "Tests compile and fail appropriately - ready for implementation"
return 0
else
log_warn "Tests exited with error but no clear failures detected"
return 0 # Proceed anyway - might be framework difference
fi
else
# Tests passed - this is unexpected in TDD before implementation
log_warn "Tests passed unexpectedly - verify tests are actually testing new functionality"
log "This may indicate tests are not properly written or feature already exists"
return 0 # Don't block, but warn
fi
}
# =============================================================================
# Helper Functions
# =============================================================================
# Get the last test specification for use in other phases
get_last_test_spec() {
echo "$LAST_TEST_SPEC"
}
# Build test spec context for dev phase prompt
build_test_spec_context_for_dev() {
local story_id="$1"
if [ -z "$LAST_TEST_SPEC" ]; then
# Try to load from file
if [ -n "$TEST_SPEC_DIR" ] && [ -f "$TEST_SPEC_DIR/${story_id}-test-spec.md" ]; then
LAST_TEST_SPEC=$(cat "$TEST_SPEC_DIR/${story_id}-test-spec.md")
fi
fi
if [ -z "$LAST_TEST_SPEC" ]; then
echo ""
return
fi
cat << EOF
## Test Specifications (TDD)
The following tests have been written and are FAILING. Your implementation must make these tests pass.
<test-specifications>
$LAST_TEST_SPEC
</test-specifications>
### TDD Implementation Guidelines
1. Run tests frequently: \`npm test\` (or equivalent)
2. Implement just enough code to make the next test pass
3. Do NOT modify the test files - only implement the feature code
4. All tests in the specification must pass when implementation is complete
EOF
}

View File

@ -36,6 +36,8 @@ LIB_DIR="$SCRIPT_DIR/epic-execute-lib"
[ -f "$LIB_DIR/decision-log.sh" ] && source "$LIB_DIR/decision-log.sh"
[ -f "$LIB_DIR/regression-gate.sh" ] && source "$LIB_DIR/regression-gate.sh"
[ -f "$LIB_DIR/design-phase.sh" ] && source "$LIB_DIR/design-phase.sh"
[ -f "$LIB_DIR/json-output.sh" ] && source "$LIB_DIR/json-output.sh"
[ -f "$LIB_DIR/tdd-flow.sh" ] && source "$LIB_DIR/tdd-flow.sh"
STORIES_DIR="$PROJECT_ROOT/docs/stories"
SPRINT_ARTIFACTS_DIR="$PROJECT_ROOT/docs/sprint-artifacts"
@ -382,6 +384,10 @@ SKIP_TRACEABILITY=false
SKIP_STATIC_ANALYSIS=false
SKIP_DESIGN=false
SKIP_REGRESSION=false
SKIP_TDD=false
SKIP_TEST_SPEC=false
SKIP_TEST_IMPL=false
LEGACY_OUTPUT=false
while [[ $# -gt 0 ]]; do
case $1 in
@ -437,6 +443,22 @@ while [[ $# -gt 0 ]]; do
SKIP_REGRESSION=true
shift
;;
--skip-tdd)
SKIP_TDD=true
shift
;;
--skip-test-spec)
SKIP_TEST_SPEC=true
shift
;;
--skip-test-impl)
SKIP_TEST_IMPL=true
shift
;;
--legacy-output)
LEGACY_OUTPUT=true
shift
;;
-*)
echo "Unknown option: $1"
exit 1
@ -465,6 +487,10 @@ if [ -z "$EPIC_ID" ]; then
echo " --skip-static-analysis Skip static analysis gate (runs real tooling)"
echo " --skip-design Skip pre-implementation design phase"
echo " --skip-regression Skip regression test gate"
echo " --skip-tdd Skip test-first development phases"
echo " --skip-test-spec Skip test specification phase only"
echo " --skip-test-impl Skip test implementation phase only"
echo " --legacy-output Use legacy text-based output parsing (no JSON)"
exit 1
fi
@ -547,6 +573,12 @@ if [ "$SKIP_REGRESSION" = false ] && type init_regression_baseline >/dev/null 2>
init_regression_baseline
fi
# Set legacy output mode if requested
if [ "$LEGACY_OUTPUT" = true ] && type -v USE_LEGACY_OUTPUT >/dev/null 2>&1; then
USE_LEGACY_OUTPUT=true
log "Using legacy text-based output parsing"
fi
# Find epic file (supports both epic-39-*.md and epic-039-*.md formats)
EPIC_FILE=""
# Pad epic ID with leading zero for 3-digit format (e.g., 40 -> 040)
@ -673,6 +705,12 @@ execute_dev_phase() {
design_context=$(build_design_context_for_dev "$story_id")
fi
# Get test spec context if available (from TDD test spec phase)
local test_spec_context=""
if type build_test_spec_context_for_dev >/dev/null 2>&1; then
test_spec_context=$(build_test_spec_context_for_dev "$story_id")
fi
# Build the dev prompt using BMAD workflow
local dev_prompt="You are executing a BMAD dev-story workflow in automated mode.
@ -721,6 +759,7 @@ $workflow_checklist
$story_contents
</story-contents>
$design_context
$test_spec_context
## Previous Implementation Context
<decision-log>
@ -743,10 +782,25 @@ $decision_context
## Completion Signals
When the workflow completes successfully (all tasks done, tests pass, status set to 'review'):
Output exactly: IMPLEMENTATION COMPLETE: $story_id
1. Output a JSON result block:
\`\`\`json
{
\"status\": \"COMPLETE\",
\"story_id\": \"$story_id\",
\"summary\": \"<brief description of what was implemented>\",
\"files_changed\": [\"<list of files created/modified>\"],
\"tests_added\": <number>,
\"decisions\": [{\"what\": \"<key decision>\", \"why\": \"<reasoning>\"}]
}
\`\`\`
2. Then output exactly: IMPLEMENTATION COMPLETE: $story_id
If a HALT condition is triggered or implementation is blocked:
Output exactly: IMPLEMENTATION BLOCKED: $story_id - [specific reason]
1. Output a JSON result block with status \"BLOCKED\" and issues array describing blockers
2. Then output exactly: IMPLEMENTATION BLOCKED: $story_id - [specific reason]
## Begin Execution
@ -765,17 +819,48 @@ Stage all changes with: git add -A (after implementation is complete)"
echo "$result" >> "$LOG_FILE"
if echo "$result" | grep -q "IMPLEMENTATION COMPLETE"; then
log_success "Dev phase complete: $story_id"
return 0
elif echo "$result" | grep -q "IMPLEMENTATION BLOCKED"; then
log_error "Dev phase blocked: $story_id"
echo "$result" | grep "IMPLEMENTATION BLOCKED"
return 1
# Check completion using JSON parsing with text fallback
local completion_status
if type check_phase_completion >/dev/null 2>&1; then
check_phase_completion "$result" "dev" "$story_id"
completion_status=$?
else
log_error "Dev phase did not complete cleanly: $story_id"
return 1
# Fallback to legacy detection
if echo "$result" | grep -q "IMPLEMENTATION COMPLETE"; then
completion_status=0
elif echo "$result" | grep -q "IMPLEMENTATION BLOCKED"; then
completion_status=1
else
completion_status=2
fi
fi
case $completion_status in
0)
# Extract decisions for decision log if available
if type get_result_decisions >/dev/null 2>&1 && type append_to_decision_log >/dev/null 2>&1; then
local decisions=$(get_result_decisions)
if [ "$decisions" != "[]" ] && [ -n "$decisions" ]; then
append_to_decision_log "DEV" "$story_id" "Decisions: $decisions"
fi
fi
log_success "Dev phase complete: $story_id"
return 0
;;
1)
log_error "Dev phase blocked: $story_id"
if type get_result_summary >/dev/null 2>&1; then
local summary=$(get_result_summary)
[ -n "$summary" ] && echo "Reason: $summary"
fi
echo "$result" | grep "IMPLEMENTATION BLOCKED" || true
return 1
;;
*)
log_error "Dev phase did not complete cleanly: $story_id"
return 1
;;
esac
}
# Global variable to store review findings for fix loop
@ -882,22 +967,45 @@ When the workflow presents options:
## Completion Signals
When review passes (all HIGH/MEDIUM issues fixed, all ACs implemented, status set to 'done'):
Output exactly: REVIEW PASSED: $story_id
When review passes but required fixes:
Output exactly: REVIEW PASSED WITH FIXES: $story_id - Fixed N issues
1. Output a JSON result block:
\`\`\`json
{
\"status\": \"PASSED\",
\"story_id\": \"$story_id\",
\"summary\": \"<what was reviewed and any fixes made>\",
\"files_changed\": [\"<files modified during review>\"],
\"issues\": []
}
\`\`\`
2. Then output exactly: REVIEW PASSED: $story_id
Or if fixes were made: REVIEW PASSED WITH FIXES: $story_id - Fixed N issues
If review fails (unfixable issues, missing acceptance criteria that YOU cannot fix):
1. First output a structured findings block:
1. Output a JSON result block with issues:
\`\`\`json
{
\"status\": \"FAILED\",
\"story_id\": \"$story_id\",
\"summary\": \"<summary of why review failed>\",
\"issues\": [
{\"severity\": \"HIGH\", \"description\": \"<issue>\", \"location\": \"<file:line>\"},
{\"severity\": \"MEDIUM\", \"description\": \"<issue>\", \"location\": \"<file:line>\"}
]
}
\`\`\`
2. Then output the legacy findings block:
\`\`\`
REVIEW FINDINGS START
- [HIGH] Description of issue 1 (file:line if applicable)
- [HIGH] Description of issue 2
- [MEDIUM] Description of issue 3
... all HIGH and MEDIUM issues that need dev attention ...
- [MEDIUM] Description of issue 2
REVIEW FINDINGS END
\`\`\`
2. Then output exactly: REVIEW FAILED: $story_id - [summary reason]
3. Then output exactly: REVIEW FAILED: $story_id - [summary reason]
## Begin Execution
@ -917,25 +1025,56 @@ Stage any fixes with: git add -A"
echo "$result" >> "$LOG_FILE"
if echo "$result" | grep -q "REVIEW PASSED"; then
log_success "Review passed: $story_id"
return 0
elif echo "$result" | grep -q "REVIEW FAILED"; then
log_error "Review failed: $story_id"
echo "$result" | grep "REVIEW FAILED"
# Extract findings for fix loop
LAST_REVIEW_FINDINGS=$(echo "$result" | sed -n '/REVIEW FINDINGS START/,/REVIEW FINDINGS END/p' | grep -E '^\s*-\s*\[(HIGH|MEDIUM)\]' || true)
if [ -n "$LAST_REVIEW_FINDINGS" ]; then
log "Captured ${#LAST_REVIEW_FINDINGS} bytes of review findings for fix loop"
fi
return 1
# Check completion using JSON parsing with text fallback
local completion_status
if type check_phase_completion >/dev/null 2>&1; then
check_phase_completion "$result" "review" "$story_id"
completion_status=$?
else
log_warn "Review did not complete cleanly: $story_id"
return 1
# Fallback to legacy detection
if echo "$result" | grep -q "REVIEW PASSED"; then
completion_status=0
elif echo "$result" | grep -q "REVIEW FAILED"; then
completion_status=1
else
completion_status=2
fi
fi
case $completion_status in
0)
log_success "Review passed: $story_id"
return 0
;;
1)
log_error "Review failed: $story_id"
echo "$result" | grep "REVIEW FAILED" || true
# Extract findings for fix loop - try JSON first, then legacy
if type get_result_issues >/dev/null 2>&1; then
local json_issues=$(get_result_issues)
if [ "$json_issues" != "[]" ] && [ -n "$json_issues" ]; then
# Convert JSON issues to text format for fix phase
LAST_REVIEW_FINDINGS=$(echo "$json_issues" | jq -r '.[] | "- [\(.severity)] \(.description) (\(.location // "unknown"))"' 2>/dev/null || echo "")
fi
fi
# Fallback to legacy text extraction if JSON didn't work
if [ -z "$LAST_REVIEW_FINDINGS" ]; then
LAST_REVIEW_FINDINGS=$(echo "$result" | sed -n '/REVIEW FINDINGS START/,/REVIEW FINDINGS END/p' | grep -E '^\s*-\s*\[(HIGH|MEDIUM)\]' || true)
fi
if [ -n "$LAST_REVIEW_FINDINGS" ]; then
log "Captured review findings for fix loop"
fi
return 1
;;
*)
log_warn "Review did not complete cleanly: $story_id"
return 1
;;
esac
}
execute_fix_phase() {
@ -1062,10 +1201,33 @@ $story_contents
## Completion Signals
When ALL review issues are successfully fixed:
Output exactly: FIX COMPLETE: $story_id - Fixed [N] issues
1. Output a JSON result block:
\`\`\`json
{
\"status\": \"COMPLETE\",
\"story_id\": \"$story_id\",
\"summary\": \"Fixed N issues: <brief list>\",
\"files_changed\": [\"<files modified>\"],
\"issues\": []
}
\`\`\`
2. Then output exactly: FIX COMPLETE: $story_id - Fixed [N] issues
If unable to fix one or more issues:
Output exactly: FIX INCOMPLETE: $story_id - [reason and which issues remain]
1. Output a JSON result block with remaining issues:
\`\`\`json
{
\"status\": \"FAILED\",
\"story_id\": \"$story_id\",
\"summary\": \"<what was fixed and what remains>\",
\"issues\": [{\"severity\": \"HIGH\", \"description\": \"<remaining issue>\", \"location\": \"<file:line>\"}]
}
\`\`\`
2. Then output exactly: FIX INCOMPLETE: $story_id - [reason and which issues remain]
## Begin Execution
@ -1082,20 +1244,40 @@ Address all review findings now. This is attempt $attempt_num of 3."
echo "$result" >> "$LOG_FILE"
if echo "$result" | grep -q "FIX COMPLETE"; then
log_success "Fix phase complete: $story_id (attempt $attempt_num)"
record_fix_attempt "$story_id" "$attempt_num" "success"
return 0
elif echo "$result" | grep -q "FIX INCOMPLETE"; then
log_error "Fix phase incomplete: $story_id (attempt $attempt_num)"
echo "$result" | grep "FIX INCOMPLETE"
record_fix_attempt "$story_id" "$attempt_num" "failed"
return 1
# Check completion using JSON parsing with text fallback
local completion_status
if type check_phase_completion >/dev/null 2>&1; then
check_phase_completion "$result" "fix" "$story_id"
completion_status=$?
else
log_warn "Fix phase did not complete cleanly: $story_id (attempt $attempt_num)"
record_fix_attempt "$story_id" "$attempt_num" "failed"
return 1
# Fallback to legacy detection
if echo "$result" | grep -q "FIX COMPLETE"; then
completion_status=0
elif echo "$result" | grep -q "FIX INCOMPLETE"; then
completion_status=1
else
completion_status=2
fi
fi
case $completion_status in
0)
log_success "Fix phase complete: $story_id (attempt $attempt_num)"
record_fix_attempt "$story_id" "$attempt_num" "success"
return 0
;;
1)
log_error "Fix phase incomplete: $story_id (attempt $attempt_num)"
echo "$result" | grep "FIX INCOMPLETE" || true
record_fix_attempt "$story_id" "$attempt_num" "failed"
return 1
;;
*)
log_warn "Fix phase did not complete cleanly: $story_id (attempt $attempt_num)"
record_fix_attempt "$story_id" "$attempt_num" "failed"
return 1
;;
esac
}
# =============================================================================
@ -1854,12 +2036,49 @@ execute_story_with_fix_loop() {
# DESIGN PHASE (Context 0) - Pre-implementation planning
if [ "$SKIP_DESIGN" = false ] && type execute_design_phase >/dev/null 2>&1; then
if ! execute_design_phase "$story_file"; then
log_warn "Design phase did not complete cleanly for $story_id - proceeding to dev"
log_warn "Design phase did not complete cleanly for $story_id - proceeding"
# Don't fail - design is advisory
fi
fi
# DEV PHASE (Context 1)
# TDD PHASES (Test-First Development)
# Enabled by default, skip with --skip-tdd or individual --skip-test-spec/--skip-test-impl
if [ "$SKIP_TDD" = false ]; then
# TEST SPEC PHASE - Generate test specifications from acceptance criteria
if [ "$SKIP_TEST_SPEC" = false ] && type execute_test_spec_phase >/dev/null 2>&1; then
if ! execute_test_spec_phase "$story_file"; then
log_warn "Test spec phase did not complete cleanly for $story_id - proceeding"
# Don't fail - spec generation is advisory
fi
fi
# TEST IMPL PHASE - Create failing tests from specifications
if [ "$SKIP_TEST_IMPL" = false ] && type execute_test_impl_phase >/dev/null 2>&1; then
# Only run if we have test specs (either just generated or loaded from file)
if [ -n "$LAST_TEST_SPEC" ] || [ -f "$TEST_SPEC_DIR/${story_id}-test-spec.md" ] 2>/dev/null; then
if ! execute_test_impl_phase "$story_file"; then
log_warn "Test impl phase did not complete cleanly for $story_id - proceeding"
# Don't fail - test impl is advisory in first iteration
fi
else
log_warn "No test specifications available - skipping test implementation"
fi
fi
# TEST VERIFICATION PHASE - Verify tests fail appropriately
if [ "$SKIP_TEST_IMPL" = false ] && type execute_test_verification_phase >/dev/null 2>&1; then
# Only verify if we just created tests
if type get_last_test_spec >/dev/null 2>&1 && [ -n "$(get_last_test_spec)" ]; then
if ! execute_test_verification_phase "$story_file"; then
log_warn "Test verification had issues - proceeding to dev phase"
# Don't fail - verification is informational
fi
fi
fi
fi
# DEV PHASE (Context 1) - Now implements to make tests pass (if TDD enabled)
if ! execute_dev_phase "$story_file"; then
log_error "Dev phase failed for $story_id"
return 1