feat(epic-execute): add JSON output parsing and TDD workflow phases

Implements the final two improvements from bmad_improvements_v2.md: ## Structured JSON Output (Improvement #6) - New module: scripts/epic-execute-lib/json-output.sh - Functions for extracting and parsing JSON from Claude output - Unified check_phase_completion() with JSON + text fallback - Updated prompts to request JSON result blocks - Added --legacy-output flag to disable JSON parsing ## Test-First Flow (Improvement #7) - New module: scripts/epic-execute-lib/tdd-flow.sh - execute_test_spec_phase() - Generates BDD specs from acceptance criteria - execute_test_impl_phase() - Creates failing tests from specs - execute_test_verification_phase() - Verifies tests fail correctly - Integration with dev phase for TDD context - Added --skip-tdd, --skip-test-spec, --skip-test-impl flags All 7 improvements from the analysis are now complete. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 14:33:25 -06:00 · 2026-01-26 14:33:25 -06:00 · 5c63f31c0e
parent 6a5b7b68e8
commit 5c63f31c0e
4 changed files with 1199 additions and 62 deletions
--- a/docs/bmad_improvements_v2.md
+++ b/docs/bmad_improvements_v2.md
@ -405,11 +405,11 @@ Respect these decisions unless you have a specific reason to deviate.

 ---

-### 6. Test-First Enforcement (HIGH IMPACT, HIGH EFFORT)
+### 6. Test-First Enforcement (HIGH IMPACT, HIGH EFFORT) ✅ IMPLEMENTED

 Restructure the flow to enforce TDD principles.

-**Proposed New Flow:**
+**Implemented Flow:**

 ```
 1. DESIGN PHASE (Dev)       → Plan implementation approach
@ -424,14 +424,29 @@ Restructure the flow to enforce TDD principles.

 This ensures tests actually test requirements rather than implementation details.

+**Implementation:** Module at `scripts/epic-execute-lib/tdd-flow.sh` with functions:
+- `execute_test_spec_phase()` - Generates BDD test specifications from acceptance criteria
+- `execute_test_impl_phase()` - Creates failing tests from specifications
+- `execute_test_verification_phase()` - Verifies tests fail correctly before implementation
+- `build_test_spec_context_for_dev()` - Provides test context to dev phase
+
+**Skip flags:** `--skip-tdd`, `--skip-test-spec`, `--skip-test-impl`
+
 ---

-### 7. Structured Output Validation (MEDIUM IMPACT, MEDIUM EFFORT)
+### 7. Structured Output Validation (MEDIUM IMPACT, MEDIUM EFFORT) ✅ IMPLEMENTED

 Replace fragile regex parsing with structured JSON output.

+**Implementation:** Module at `scripts/epic-execute-lib/json-output.sh` with functions:
+- `extract_json_result()` - Parse JSON from Claude output
+- `get_result_status()` - Extract status field
+- `get_result_files()` - Extract files_changed array
+- `get_result_issues()` - Extract issues for fix loops
+- `check_phase_completion()` - Unified completion detection with JSON + text fallback
+
 ```bash
-# Add to prompts:
+# Prompts now request JSON output:
 "Output your result as JSON:
 \`\`\`json
 {
@ -445,11 +460,13 @@ Replace fragile regex parsing with structured JSON output.
 }
 \`\`\`"

-# Parse with jq:
-result_json=$(echo "$result" | sed -n '/```json/,/```/p' | sed '1d;$d')
-status=$(echo "$result_json" | jq -r '.status')
+# Parsing with fallback:
+check_phase_completion "$result" "dev" "$story_id"
+# Returns: 0 (complete), 1 (failed), 2 (unclear)
 ```

+**Skip flag:** `--legacy-output` (disables JSON parsing, uses text-only detection)
+
 ---

 ## Implementation Priority Matrix
@ -461,8 +478,8 @@ status=$(echo "$result_json" | jq -r '.status')
 | 3 | Decision Log | MEDIUM | LOW | ✅ DONE | Easy context preservation |
 | 4 | Regression Gate | HIGH | MEDIUM | ✅ DONE | Prevents silent breakage |
 | 5 | Design Phase | HIGH | MEDIUM | ✅ DONE | Catches issues early |
-| 6 | Structured JSON Output | MEDIUM | MEDIUM | | Improves reliability |
-| 7 | Test-First Flow | HIGH | HIGH | | Fundamental quality improvement |
+| 6 | Structured JSON Output | MEDIUM | MEDIUM | ✅ DONE | Improves reliability |
+| 7 | Test-First Flow | HIGH | HIGH | ✅ DONE | Fundamental quality improvement |

 ---

@ -532,8 +549,16 @@ These three changes alone would dramatically improve code reliability with minim
 **Additionally implemented:**
 4. **Regression Gate** - Prevents silent breakage ✅
 5. **Design Phase** - Catches architectural issues early ✅
+6. **Structured JSON Output** - Reliable completion signal parsing ✅
+7. **Test-First Flow** - TDD workflow with test specs before implementation ✅

-**Implementation:** All features are modularized in `scripts/epic-execute-lib/` with graceful degradation and skip flags (`--skip-design`, `--skip-regression`).
+**Implementation:** All features are modularized in `scripts/epic-execute-lib/` with graceful degradation and skip flags:
+- `--skip-design` - Skip pre-implementation design phase
+- `--skip-regression` - Skip regression test gate
+- `--skip-tdd` - Skip test-first development phases
+- `--skip-test-spec` - Skip test specification phase only
+- `--skip-test-impl` - Skip test implementation phase only
+- `--legacy-output` - Use legacy text-based output parsing (no JSON)

 ---

--- a/scripts/epic-execute-lib/json-output.sh
+++ b/scripts/epic-execute-lib/json-output.sh
@ -0,0 +1,433 @@
+#!/bin/bash
+#
+# BMAD Epic Execute - JSON Output Module
+#
+# Provides functions for structured JSON output parsing to replace
+# fragile regex-based completion signal detection.
+#
+# Usage: Sourced by epic-execute.sh
+#
+
+# =============================================================================
+# JSON Output Variables
+# =============================================================================
+
+# Whether to use legacy text-based parsing instead of JSON
+USE_LEGACY_OUTPUT=false
+
+# Last extracted JSON result (for reuse within a phase)
+LAST_JSON_RESULT=""
+
+# =============================================================================
+# JSON Output Functions
+# =============================================================================
+
+# Extract JSON result block from Claude output
+# Looks for ```json ... ``` or ```result ... ``` blocks
+# Arguments:
+#   $1 - Full Claude output
+# Returns: JSON string or empty if not found
+extract_json_result() {
+    local output="$1"
+
+    # Reset last result
+    LAST_JSON_RESULT=""
+
+    # Try to extract JSON from code block (```json ... ```)
+    local json_block
+    json_block=$(echo "$output" | sed -n '/```json/,/```/p' | sed '1d;$d')
+
+    # If no ```json block, try ```result block
+    if [ -z "$json_block" ]; then
+        json_block=$(echo "$output" | sed -n '/```result/,/```/p' | sed '1d;$d')
+    fi
+
+    # If still no block, try to find raw JSON object at end of output
+    if [ -z "$json_block" ]; then
+        # Look for JSON object pattern {"status": ...}
+        json_block=$(echo "$output" | grep -oE '\{"status":[^}]+\}' | tail -1)
+    fi
+
+    # Validate JSON if jq is available
+    if [ -n "$json_block" ] && command -v jq >/dev/null 2>&1; then
+        if echo "$json_block" | jq . >/dev/null 2>&1; then
+            LAST_JSON_RESULT="$json_block"
+            echo "$json_block"
+            return 0
+        fi
+    elif [ -n "$json_block" ]; then
+        # jq not available, return raw block
+        LAST_JSON_RESULT="$json_block"
+        echo "$json_block"
+        return 0
+    fi
+
+    echo ""
+    return 1
+}
+
+# Get the status field from a JSON result
+# Arguments:
+#   $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
+# Returns: Status string (COMPLETE, BLOCKED, FAILED, PASSED, etc.)
+get_result_status() {
+    local json="${1:-$LAST_JSON_RESULT}"
+
+    if [ -z "$json" ]; then
+        echo ""
+        return 1
+    fi
+
+    if command -v jq >/dev/null 2>&1; then
+        echo "$json" | jq -r '.status // empty'
+    else
+        # Fallback: basic pattern matching
+        echo "$json" | grep -oE '"status":\s*"[^"]+"' | sed 's/.*"\([^"]*\)"$/\1/'
+    fi
+}
+
+# Get the story_id field from a JSON result
+# Arguments:
+#   $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
+get_result_story_id() {
+    local json="${1:-$LAST_JSON_RESULT}"
+
+    if [ -z "$json" ]; then
+        echo ""
+        return 1
+    fi
+
+    if command -v jq >/dev/null 2>&1; then
+        echo "$json" | jq -r '.story_id // empty'
+    else
+        echo "$json" | grep -oE '"story_id":\s*"[^"]+"' | sed 's/.*"\([^"]*\)"$/\1/'
+    fi
+}
+
+# Get the summary field from a JSON result
+# Arguments:
+#   $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
+get_result_summary() {
+    local json="${1:-$LAST_JSON_RESULT}"
+
+    if [ -z "$json" ]; then
+        echo ""
+        return 1
+    fi
+
+    if command -v jq >/dev/null 2>&1; then
+        echo "$json" | jq -r '.summary // empty'
+    else
+        echo "$json" | grep -oE '"summary":\s*"[^"]+"' | sed 's/.*"\([^"]*\)"$/\1/'
+    fi
+}
+
+# Get the files_changed array from a JSON result
+# Arguments:
+#   $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
+# Returns: Newline-separated list of file paths
+get_result_files() {
+    local json="${1:-$LAST_JSON_RESULT}"
+
+    if [ -z "$json" ]; then
+        echo ""
+        return 1
+    fi
+
+    if command -v jq >/dev/null 2>&1; then
+        echo "$json" | jq -r '.files_changed[]? // empty'
+    else
+        # Fallback: basic pattern matching (limited)
+        echo "$json" | grep -oE '"files_changed":\s*\[[^\]]*\]' | grep -oE '"[^"]+\.[a-z]+"' | tr -d '"'
+    fi
+}
+
+# Get the concerns array from a JSON result
+# Arguments:
+#   $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
+# Returns: Newline-separated list of concerns
+get_result_concerns() {
+    local json="${1:-$LAST_JSON_RESULT}"
+
+    if [ -z "$json" ]; then
+        echo ""
+        return 1
+    fi
+
+    if command -v jq >/dev/null 2>&1; then
+        echo "$json" | jq -r '.concerns[]? // empty'
+    else
+        echo ""
+    fi
+}
+
+# Get the issues array from a JSON result (for review/fix phases)
+# Arguments:
+#   $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
+# Returns: JSON array of issues or empty
+get_result_issues() {
+    local json="${1:-$LAST_JSON_RESULT}"
+
+    if [ -z "$json" ]; then
+        echo ""
+        return 1
+    fi
+
+    if command -v jq >/dev/null 2>&1; then
+        echo "$json" | jq -c '.issues // []'
+    else
+        echo "[]"
+    fi
+}
+
+# Get the tests_added count from a JSON result
+# Arguments:
+#   $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
+# Returns: Number of tests added
+get_result_tests_added() {
+    local json="${1:-$LAST_JSON_RESULT}"
+
+    if [ -z "$json" ]; then
+        echo "0"
+        return 1
+    fi
+
+    if command -v jq >/dev/null 2>&1; then
+        echo "$json" | jq -r '.tests_added // 0'
+    else
+        echo "$json" | grep -oE '"tests_added":\s*[0-9]+' | grep -oE '[0-9]+' || echo "0"
+    fi
+}
+
+# Get the decisions array from a JSON result
+# Arguments:
+#   $1 - JSON string (optional, uses LAST_JSON_RESULT if not provided)
+# Returns: JSON array of decisions
+get_result_decisions() {
+    local json="${1:-$LAST_JSON_RESULT}"
+
+    if [ -z "$json" ]; then
+        echo "[]"
+        return 1
+    fi
+
+    if command -v jq >/dev/null 2>&1; then
+        echo "$json" | jq -c '.decisions // []'
+    else
+        echo "[]"
+    fi
+}
+
+# Check phase completion with JSON parsing and text fallback
+# Arguments:
+#   $1 - Full Claude output
+#   $2 - Phase type (dev, review, fix, arch, test_quality, trace, uat)
+#   $3 - Story ID (for legacy text matching)
+# Returns: 0 if complete/passed, 1 if failed/blocked, 2 if unclear
+check_phase_completion() {
+    local output="$1"
+    local phase_type="$2"
+    local story_id="$3"
+
+    # Try JSON parsing first (unless legacy mode)
+    if [ "$USE_LEGACY_OUTPUT" != true ]; then
+        local json_result
+        json_result=$(extract_json_result "$output")
+
+        if [ -n "$json_result" ]; then
+            local status
+            status=$(get_result_status "$json_result")
+
+            case "$status" in
+                COMPLETE|PASSED|COMPLIANT|APPROVED)
+                    return 0
+                    ;;
+                BLOCKED|FAILED|VIOLATIONS|CONCERNS)
+                    return 1
+                    ;;
+            esac
+        fi
+    fi
+
+    # Fallback to legacy text-based parsing
+    case "$phase_type" in
+        dev)
+            if echo "$output" | grep -q "IMPLEMENTATION COMPLETE"; then
+                return 0
+            elif echo "$output" | grep -q "IMPLEMENTATION BLOCKED"; then
+                return 1
+            fi
+            ;;
+        review)
+            if echo "$output" | grep -q "REVIEW PASSED"; then
+                return 0
+            elif echo "$output" | grep -q "REVIEW FAILED"; then
+                return 1
+            fi
+            ;;
+        fix)
+            if echo "$output" | grep -q "FIX COMPLETE"; then
+                return 0
+            elif echo "$output" | grep -q "FIX INCOMPLETE"; then
+                return 1
+            fi
+            ;;
+        arch)
+            if echo "$output" | grep -q "ARCH COMPLIANT"; then
+                return 0
+            elif echo "$output" | grep -q "ARCH VIOLATIONS"; then
+                return 1
+            fi
+            ;;
+        test_quality)
+            if echo "$output" | grep -q "TEST QUALITY APPROVED"; then
+                return 0
+            elif echo "$output" | grep -q "TEST QUALITY FAILED"; then
+                return 1
+            elif echo "$output" | grep -q "TEST QUALITY CONCERNS"; then
+                # Concerns don't block
+                return 0
+            fi
+            ;;
+        trace)
+            if echo "$output" | grep -q "TRACEABILITY PASS"; then
+                return 0
+            elif echo "$output" | grep -q "TRACEABILITY FAIL"; then
+                return 1
+            elif echo "$output" | grep -q "TRACEABILITY CONCERNS"; then
+                return 0
+            fi
+            ;;
+        uat)
+            if echo "$output" | grep -q "UAT GENERATED"; then
+                return 0
+            fi
+            ;;
+        test_gen)
+            if echo "$output" | grep -q "TEST GENERATION COMPLETE"; then
+                return 0
+            elif echo "$output" | grep -q "TEST GENERATION PARTIAL"; then
+                return 1
+            fi
+            ;;
+    esac
+
+    # Unclear result
+    return 2
+}
+
+# Build JSON output instruction block for prompts
+# Arguments:
+#   $1 - Phase type (dev, review, fix, arch, test_quality, trace, uat)
+#   $2 - Story ID
+# Returns: Instruction text for prompts
+build_json_output_instructions() {
+    local phase_type="$1"
+    local story_id="$2"
+
+    cat << 'EOF'
+
+## Output Format
+
+After completing your task, output a JSON result block:
+
+```json
+{
+  "status": "COMPLETE" | "BLOCKED" | "FAILED" | "PASSED" | "VIOLATIONS" | "CONCERNS",
+  "story_id": "<story id>",
+  "summary": "<brief description of what was done>",
+  "files_changed": ["<path1>", "<path2>"],
+  "tests_added": <number>,
+  "decisions": [
+    {"what": "<decision made>", "why": "<reasoning>"}
+  ],
+  "issues": [
+    {"severity": "HIGH|MEDIUM|LOW", "description": "<issue>", "location": "<file:line>"}
+  ],
+  "concerns": ["<any concerns or warnings>"]
+}
+```
+
+### Status Values by Phase
+EOF
+
+    case "$phase_type" in
+        dev)
+            cat << EOF
+
+- **COMPLETE**: Implementation finished successfully
+- **BLOCKED**: Cannot proceed due to missing dependencies or unclear requirements
+
+Then ALSO output the legacy signal for backward compatibility:
+- Success: \`IMPLEMENTATION COMPLETE: $story_id\`
+- Blocked: \`IMPLEMENTATION BLOCKED: $story_id - [reason]\`
+EOF
+            ;;
+        review)
+            cat << EOF
+
+- **PASSED**: Code review passed (all issues fixed or acceptable)
+- **FAILED**: Critical issues remain that need developer attention
+
+Then ALSO output the legacy signal for backward compatibility:
+- Pass: \`REVIEW PASSED: $story_id\`
+- Fail: \`REVIEW FAILED: $story_id - [reason]\`
+EOF
+            ;;
+        fix)
+            cat << EOF
+
+- **COMPLETE**: All issues from review have been fixed
+- **FAILED**: Unable to fix one or more issues
+
+Then ALSO output the legacy signal for backward compatibility:
+- Complete: \`FIX COMPLETE: $story_id - Fixed N issues\`
+- Incomplete: \`FIX INCOMPLETE: $story_id - [reason]\`
+EOF
+            ;;
+        arch)
+            cat << EOF
+
+- **COMPLIANT**: No architecture violations (or all fixed)
+- **VIOLATIONS**: Architecture violations that need attention
+
+Then ALSO output the legacy signal for backward compatibility:
+- Compliant: \`ARCH COMPLIANT: $story_id\`
+- Violations: \`ARCH VIOLATIONS: $story_id - [summary]\`
+EOF
+            ;;
+        test_quality)
+            cat << EOF
+
+- **APPROVED**: Test quality meets standards (score >= 70)
+- **CONCERNS**: Minor quality issues (score 60-69)
+- **FAILED**: Test quality below acceptable threshold (score < 60)
+
+Then ALSO output the legacy signal for backward compatibility:
+- Approved: \`TEST QUALITY APPROVED: $story_id - Score: N/100\`
+- Concerns: \`TEST QUALITY CONCERNS: $story_id - Score: N/100\`
+- Failed: \`TEST QUALITY FAILED: $story_id - Score: N/100\`
+EOF
+            ;;
+        trace)
+            cat << EOF
+
+- **PASSED**: Traceability requirements met (P0=100%, P1>=90%)
+- **CONCERNS**: Minor gaps (P1 80-89%)
+- **FAILED**: Critical traceability gaps
+
+Then ALSO output the legacy signal for backward compatibility:
+- Pass: \`TRACEABILITY PASS: Epic-$story_id - P0: N%, P1: M%\`
+- Fail: \`TRACEABILITY FAIL: Epic-$story_id - X critical gaps\`
+EOF
+            ;;
+        uat)
+            cat << EOF
+
+- **COMPLETE**: UAT document generated successfully
+
+Then ALSO output the legacy signal: \`UAT GENERATED: <path>\`
+EOF
+            ;;
+    esac
+}
--- a/scripts/epic-execute-lib/tdd-flow.sh
+++ b/scripts/epic-execute-lib/tdd-flow.sh
@ -0,0 +1,460 @@
+#!/bin/bash
+#
+# BMAD Epic Execute - TDD Flow Module
+#
+# Provides test-first development (TDD) workflow phases:
+# 1. Test Specification - Generate test specs from acceptance criteria
+# 2. Test Implementation - Create failing tests from specs
+# 3. Test Verification - Verify tests fail appropriately before implementation
+#
+# Usage: Sourced by epic-execute.sh
+#
+
+# =============================================================================
+# TDD Flow Variables
+# =============================================================================
+
+# Store test specifications for use across phases
+LAST_TEST_SPEC=""
+
+# Test spec output directory
+TEST_SPEC_DIR=""
+
+# =============================================================================
+# Test Specification Phase
+# =============================================================================
+
+# Execute test specification phase
+# Generates BDD-style test specifications from acceptance criteria
+# Arguments:
+#   $1 - story_file path
+execute_test_spec_phase() {
+    local story_file="$1"
+    local story_id=$(basename "$story_file" .md)
+
+    # Reset last spec
+    LAST_TEST_SPEC=""
+
+    log ">>> TEST SPEC PHASE: $story_id (generating test specifications)"
+
+    local story_contents=$(cat "$story_file")
+
+    # Load architecture file if available for context
+    local arch_contents=""
+    for search_path in "$PROJECT_ROOT/docs/architecture.md" "$PROJECT_ROOT/docs/architecture/architecture.md" "$PROJECT_ROOT/architecture.md"; do
+        if [ -f "$search_path" ]; then
+            arch_contents=$(cat "$search_path")
+            break
+        fi
+    done
+
+    # Get design context if available
+    local design_context=""
+    if type get_last_design >/dev/null 2>&1; then
+        design_context=$(get_last_design)
+    fi
+
+    local spec_prompt="You are a Test Architect (TEA) generating test specifications from acceptance criteria.
+
+## Your Task
+
+Generate test specifications for: $story_id
+
+Do NOT write test code yet. Output only test specifications in BDD format.
+
+### CRITICAL RULES
+- One test specification per acceptance criterion minimum
+- Use Given-When-Then format for all specifications
+- Include edge cases and error scenarios
+- Assign unique test IDs (format: ${story_id}-E2E-001, ${story_id}-UNIT-001)
+- Map each AC explicitly to test specifications
+
+## Story to Analyze
+
+**Story Path:** $story_file
+**Story ID:** $story_id
+
+<story>
+$story_contents
+</story>
+
+## Architecture Context (for understanding test boundaries)
+
+<architecture>
+$arch_contents
+</architecture>
+
+## Design Context (if available)
+
+<design>
+$design_context
+</design>
+
+## Exploration Commands
+
+First, explore existing test patterns in the codebase:
+\`\`\`bash
+# Find existing test files
+find . -type f \\( -name \"*.spec.ts\" -o -name \"*.test.ts\" -o -name \"*.spec.js\" -o -name \"*.test.js\" \\) | head -10
+# Check test directory structure
+ls -la test/ tests/ __tests__/ src/**/__tests__/ 2>/dev/null || true
+\`\`\`
+
+## Required Output
+
+Output your test specifications in this exact format:
+
+\`\`\`
+TEST SPEC START
+story_id: $story_id
+generated: $(date '+%Y-%m-%d')
+
+test_specifications:
+
+## AC1: <acceptance criterion text>
+
+### ${story_id}-E2E-001: <descriptive test name>
+- Priority: P0|P1|P2
+- Type: e2e|integration|unit
+- Given: <precondition state>
+- When: <action performed>
+- Then: <expected outcome>
+- Data: <test data requirements>
+
+### ${story_id}-E2E-002: <edge case for AC1>
+- Priority: P1
+- Type: e2e
+- Given: <edge case precondition>
+- When: <action>
+- Then: <expected error/behavior>
+
+## AC2: <next acceptance criterion>
+
+### ${story_id}-UNIT-001: <unit test name>
+...
+
+edge_cases:
+  - <scenario not in ACs but important>
+
+error_scenarios:
+  - <error condition to test>
+
+test_file_mapping:
+  - ${story_id}-E2E-*: <suggested test file path>
+  - ${story_id}-UNIT-*: <suggested test file path>
+  - ${story_id}-INT-*: <suggested test file path>
+
+TEST SPEC END
+\`\`\`
+
+## Completion Signal
+
+After outputting the spec block:
+
+1. Output JSON result:
+\`\`\`json
+{
+  \"status\": \"COMPLETE\",
+  \"story_id\": \"$story_id\",
+  \"summary\": \"Generated N test specifications for M acceptance criteria\",
+  \"tests_added\": <number of specs>
+}
+\`\`\`
+
+2. Then output: TEST SPEC COMPLETE: $story_id - Generated N specifications"
+
+    if [ "$DRY_RUN" = true ]; then
+        echo "[DRY RUN] Would execute test spec phase for $story_id"
+        return 0
+    fi
+
+    local result
+    result=$(claude --dangerously-skip-permissions -p "$spec_prompt" 2>&1) || true
+
+    echo "$result" >> "$LOG_FILE"
+
+    # Extract test spec block
+    LAST_TEST_SPEC=$(echo "$result" | sed -n '/TEST SPEC START/,/TEST SPEC END/p')
+
+    if [ -n "$LAST_TEST_SPEC" ]; then
+        # Save to spec directory
+        TEST_SPEC_DIR="$SPRINT_ARTIFACTS_DIR/test-specs"
+        mkdir -p "$TEST_SPEC_DIR"
+        echo "$LAST_TEST_SPEC" > "$TEST_SPEC_DIR/${story_id}-test-spec.md"
+
+        # Save to decision log
+        if type append_to_decision_log >/dev/null 2>&1; then
+            append_to_decision_log "TEST_SPEC" "$story_id" "$LAST_TEST_SPEC"
+        fi
+
+        log_success "Test spec phase complete: $story_id"
+        log "Saved to: $TEST_SPEC_DIR/${story_id}-test-spec.md"
+        return 0
+    else
+        log_error "Test spec phase did not produce valid output"
+        return 1
+    fi
+}
+
+# =============================================================================
+# Test Implementation Phase
+# =============================================================================
+
+# Execute test implementation phase
+# Creates failing tests from specifications
+# Arguments:
+#   $1 - story_file path
+execute_test_impl_phase() {
+    local story_file="$1"
+    local story_id=$(basename "$story_file" .md)
+
+    log ">>> TEST IMPL PHASE: $story_id (implementing failing tests)"
+
+    # Check if we have test specs
+    if [ -z "$LAST_TEST_SPEC" ]; then
+        # Try to load from file
+        if [ -f "$TEST_SPEC_DIR/${story_id}-test-spec.md" ]; then
+            LAST_TEST_SPEC=$(cat "$TEST_SPEC_DIR/${story_id}-test-spec.md")
+        else
+            log_error "No test specifications available for $story_id"
+            return 1
+        fi
+    fi
+
+    local story_contents=$(cat "$story_file")
+
+    local impl_prompt="You are a Test Architect (TEA) implementing tests from specifications.
+
+## Your Task
+
+Implement failing tests for: $story_id
+
+The tests MUST FAIL initially because the feature is not yet implemented.
+This is Test-First Development (TDD).
+
+### CRITICAL RULES
+- Create test files based on the specifications below
+- Tests should compile/parse without errors
+- Tests should FAIL when run (feature not implemented yet)
+- Follow existing test patterns in the codebase
+- Use proper fixtures and data factories
+- Do NOT implement any feature code
+
+## Test Specifications to Implement
+
+<test-spec>
+$LAST_TEST_SPEC
+</test-spec>
+
+## Story Context
+
+<story>
+$story_contents
+</story>
+
+## Exploration Commands
+
+First, examine existing test patterns:
+\`\`\`bash
+# Find existing test patterns
+find . -type f \\( -name \"*.spec.ts\" -o -name \"*.test.ts\" \\) -exec head -50 {} \\; 2>/dev/null | head -100
+# Check for test utilities
+ls -la test/utils/ tests/helpers/ __tests__/fixtures/ 2>/dev/null || true
+\`\`\`
+
+## Implementation Guidelines
+
+1. **File Structure**: Create test files in the appropriate directory
+2. **Imports**: Use the project's test framework (jest, mocha, vitest, etc.)
+3. **Describe Blocks**: Group tests by acceptance criterion
+4. **Test Names**: Include test IDs from specifications
+5. **BDD Format**: Use Given-When-Then comments
+6. **Assertions**: Write assertions that will FAIL until feature is implemented
+7. **Data**: Use factories or fixtures, no hardcoded values
+
+## Example Test Structure
+
+\`\`\`typescript
+describe('Feature: <story description>', () => {
+  describe('AC1: <acceptance criterion>', () => {
+    test('${story_id}-E2E-001: should <expected behavior>', async () => {
+      // Given: <precondition>
+      const setup = await createTestFixture();
+
+      // When: <action>
+      const result = await performAction(setup);
+
+      // Then: <expected outcome>
+      expect(result).toBe(expectedValue); // Will FAIL - not implemented
+    });
+  });
+});
+\`\`\`
+
+## Completion Signal
+
+After implementing the tests:
+
+1. Stage the test files: git add -A
+2. Output JSON result:
+\`\`\`json
+{
+  \"status\": \"COMPLETE\",
+  \"story_id\": \"$story_id\",
+  \"summary\": \"Implemented N failing tests in M files\",
+  \"files_changed\": [\"<test file paths>\"],
+  \"tests_added\": <number>
+}
+\`\`\`
+
+3. Then output: TEST IMPL COMPLETE: $story_id - Implemented N tests"
+
+    if [ "$DRY_RUN" = true ]; then
+        echo "[DRY RUN] Would execute test impl phase for $story_id"
+        return 0
+    fi
+
+    local result
+    result=$(claude --dangerously-skip-permissions -p "$impl_prompt" 2>&1) || true
+
+    echo "$result" >> "$LOG_FILE"
+
+    # Check completion
+    local completion_status
+    if type check_phase_completion >/dev/null 2>&1; then
+        check_phase_completion "$result" "test_gen" "$story_id"
+        completion_status=$?
+    else
+        if echo "$result" | grep -q "TEST IMPL COMPLETE"; then
+            completion_status=0
+        else
+            completion_status=2
+        fi
+    fi
+
+    case $completion_status in
+        0)
+            log_success "Test impl phase complete: $story_id"
+            return 0
+            ;;
+        *)
+            log_error "Test impl phase did not complete cleanly: $story_id"
+            return 1
+            ;;
+    esac
+}
+
+# =============================================================================
+# Test Verification Phase
+# =============================================================================
+
+# Execute test verification phase
+# Verifies that tests fail appropriately (compile but don't pass)
+# Arguments:
+#   $1 - story_file path
+execute_test_verification_phase() {
+    local story_file="$1"
+    local story_id=$(basename "$story_file" .md)
+
+    log ">>> TEST VERIFICATION PHASE: $story_id (verifying tests fail correctly)"
+
+    if [ "$DRY_RUN" = true ]; then
+        echo "[DRY RUN] Would verify tests fail for $story_id"
+        return 0
+    fi
+
+    # Run tests and expect failures
+    local test_output=""
+    local test_exit_code=0
+
+    if [ -f "$PROJECT_ROOT/package.json" ]; then
+        # Node.js project
+        test_output=$(cd "$PROJECT_ROOT" && npm test 2>&1) || test_exit_code=$?
+    elif [ -f "$PROJECT_ROOT/Cargo.toml" ]; then
+        test_output=$(cd "$PROJECT_ROOT" && cargo test 2>&1) || test_exit_code=$?
+    elif [ -f "$PROJECT_ROOT/go.mod" ]; then
+        test_output=$(cd "$PROJECT_ROOT" && go test ./... 2>&1) || test_exit_code=$?
+    elif [ -f "$PROJECT_ROOT/requirements.txt" ] || [ -f "$PROJECT_ROOT/pyproject.toml" ]; then
+        if command -v pytest >/dev/null 2>&1; then
+            test_output=$(cd "$PROJECT_ROOT" && pytest 2>&1) || test_exit_code=$?
+        fi
+    fi
+
+    echo "$test_output" >> "$LOG_FILE"
+
+    # Analyze test output
+    # We expect: tests compile, tests run, tests FAIL (exit code non-zero)
+
+    # Check for compilation errors (bad - tests should at least compile)
+    if echo "$test_output" | grep -qiE "syntax error|cannot find module|compilation failed|parse error"; then
+        log_error "Tests have compilation/syntax errors - fix before proceeding"
+        echo "$test_output" | grep -iE "syntax error|cannot find module|compilation failed|parse error" | head -10
+        return 1
+    fi
+
+    # Check for test failures (good - expected in TDD)
+    if [ $test_exit_code -ne 0 ]; then
+        # Count failures
+        local failure_count=0
+        failure_count=$(echo "$test_output" | grep -cE "FAIL|failed|failing" || echo "0")
+
+        if [ "$failure_count" -gt 0 ]; then
+            log_success "Test verification passed: $failure_count test(s) failing as expected"
+            log "Tests compile and fail appropriately - ready for implementation"
+            return 0
+        else
+            log_warn "Tests exited with error but no clear failures detected"
+            return 0  # Proceed anyway - might be framework difference
+        fi
+    else
+        # Tests passed - this is unexpected in TDD before implementation
+        log_warn "Tests passed unexpectedly - verify tests are actually testing new functionality"
+        log "This may indicate tests are not properly written or feature already exists"
+        return 0  # Don't block, but warn
+    fi
+}
+
+# =============================================================================
+# Helper Functions
+# =============================================================================
+
+# Get the last test specification for use in other phases
+get_last_test_spec() {
+    echo "$LAST_TEST_SPEC"
+}
+
+# Build test spec context for dev phase prompt
+build_test_spec_context_for_dev() {
+    local story_id="$1"
+
+    if [ -z "$LAST_TEST_SPEC" ]; then
+        # Try to load from file
+        if [ -n "$TEST_SPEC_DIR" ] && [ -f "$TEST_SPEC_DIR/${story_id}-test-spec.md" ]; then
+            LAST_TEST_SPEC=$(cat "$TEST_SPEC_DIR/${story_id}-test-spec.md")
+        fi
+    fi
+
+    if [ -z "$LAST_TEST_SPEC" ]; then
+        echo ""
+        return
+    fi
+
+    cat << EOF
+
+## Test Specifications (TDD)
+
+The following tests have been written and are FAILING. Your implementation must make these tests pass.
+
+<test-specifications>
+$LAST_TEST_SPEC
+</test-specifications>
+
+### TDD Implementation Guidelines
+
+1. Run tests frequently: \`npm test\` (or equivalent)
+2. Implement just enough code to make the next test pass
+3. Do NOT modify the test files - only implement the feature code
+4. All tests in the specification must pass when implementation is complete
+
+EOF
+}
--- a/scripts/epic-execute.sh
+++ b/scripts/epic-execute.sh
@ -36,6 +36,8 @@ LIB_DIR="$SCRIPT_DIR/epic-execute-lib"
 [ -f "$LIB_DIR/decision-log.sh" ] && source "$LIB_DIR/decision-log.sh"
 [ -f "$LIB_DIR/regression-gate.sh" ] && source "$LIB_DIR/regression-gate.sh"
 [ -f "$LIB_DIR/design-phase.sh" ] && source "$LIB_DIR/design-phase.sh"
+[ -f "$LIB_DIR/json-output.sh" ] && source "$LIB_DIR/json-output.sh"
+[ -f "$LIB_DIR/tdd-flow.sh" ] && source "$LIB_DIR/tdd-flow.sh"

 STORIES_DIR="$PROJECT_ROOT/docs/stories"
 SPRINT_ARTIFACTS_DIR="$PROJECT_ROOT/docs/sprint-artifacts"
@ -382,6 +384,10 @@ SKIP_TRACEABILITY=false
 SKIP_STATIC_ANALYSIS=false
 SKIP_DESIGN=false
 SKIP_REGRESSION=false
+SKIP_TDD=false
+SKIP_TEST_SPEC=false
+SKIP_TEST_IMPL=false
+LEGACY_OUTPUT=false

 while [[ $# -gt 0 ]]; do
    case $1 in
@ -437,6 +443,22 @@ while [[ $# -gt 0 ]]; do
            SKIP_REGRESSION=true
            shift
            ;;
+        --skip-tdd)
+            SKIP_TDD=true
+            shift
+            ;;
+        --skip-test-spec)
+            SKIP_TEST_SPEC=true
+            shift
+            ;;
+        --skip-test-impl)
+            SKIP_TEST_IMPL=true
+            shift
+            ;;
+        --legacy-output)
+            LEGACY_OUTPUT=true
+            shift
+            ;;
        -*)
            echo "Unknown option: $1"
            exit 1
@ -465,6 +487,10 @@ if [ -z "$EPIC_ID" ]; then
    echo "  --skip-static-analysis  Skip static analysis gate (runs real tooling)"
    echo "  --skip-design     Skip pre-implementation design phase"
    echo "  --skip-regression Skip regression test gate"
+    echo "  --skip-tdd        Skip test-first development phases"
+    echo "  --skip-test-spec  Skip test specification phase only"
+    echo "  --skip-test-impl  Skip test implementation phase only"
+    echo "  --legacy-output   Use legacy text-based output parsing (no JSON)"
    exit 1
 fi

@ -547,6 +573,12 @@ if [ "$SKIP_REGRESSION" = false ] && type init_regression_baseline >/dev/null 2>
    init_regression_baseline
 fi

+# Set legacy output mode if requested
+if [ "$LEGACY_OUTPUT" = true ] && type -v USE_LEGACY_OUTPUT >/dev/null 2>&1; then
+    USE_LEGACY_OUTPUT=true
+    log "Using legacy text-based output parsing"
+fi
+
 # Find epic file (supports both epic-39-*.md and epic-039-*.md formats)
 EPIC_FILE=""
 # Pad epic ID with leading zero for 3-digit format (e.g., 40 -> 040)
@ -673,6 +705,12 @@ execute_dev_phase() {
        design_context=$(build_design_context_for_dev "$story_id")
    fi

+    # Get test spec context if available (from TDD test spec phase)
+    local test_spec_context=""
+    if type build_test_spec_context_for_dev >/dev/null 2>&1; then
+        test_spec_context=$(build_test_spec_context_for_dev "$story_id")
+    fi
+
    # Build the dev prompt using BMAD workflow
    local dev_prompt="You are executing a BMAD dev-story workflow in automated mode.

@ -721,6 +759,7 @@ $workflow_checklist
 $story_contents
 </story-contents>
 $design_context
+$test_spec_context
 ## Previous Implementation Context

 <decision-log>
@ -743,10 +782,25 @@ $decision_context
 ## Completion Signals

 When the workflow completes successfully (all tasks done, tests pass, status set to 'review'):
-Output exactly: IMPLEMENTATION COMPLETE: $story_id
+
+1. Output a JSON result block:
+\`\`\`json
+{
+  \"status\": \"COMPLETE\",
+  \"story_id\": \"$story_id\",
+  \"summary\": \"<brief description of what was implemented>\",
+  \"files_changed\": [\"<list of files created/modified>\"],
+  \"tests_added\": <number>,
+  \"decisions\": [{\"what\": \"<key decision>\", \"why\": \"<reasoning>\"}]
+}
+\`\`\`
+
+2. Then output exactly: IMPLEMENTATION COMPLETE: $story_id

 If a HALT condition is triggered or implementation is blocked:
-Output exactly: IMPLEMENTATION BLOCKED: $story_id - [specific reason]
+
+1. Output a JSON result block with status \"BLOCKED\" and issues array describing blockers
+2. Then output exactly: IMPLEMENTATION BLOCKED: $story_id - [specific reason]

 ## Begin Execution

@ -765,17 +819,48 @@ Stage all changes with: git add -A (after implementation is complete)"

    echo "$result" >> "$LOG_FILE"

-    if echo "$result" | grep -q "IMPLEMENTATION COMPLETE"; then
-        log_success "Dev phase complete: $story_id"
-        return 0
-    elif echo "$result" | grep -q "IMPLEMENTATION BLOCKED"; then
-        log_error "Dev phase blocked: $story_id"
-        echo "$result" | grep "IMPLEMENTATION BLOCKED"
-        return 1
+    # Check completion using JSON parsing with text fallback
+    local completion_status
+    if type check_phase_completion >/dev/null 2>&1; then
+        check_phase_completion "$result" "dev" "$story_id"
+        completion_status=$?
    else
-        log_error "Dev phase did not complete cleanly: $story_id"
-        return 1
+        # Fallback to legacy detection
+        if echo "$result" | grep -q "IMPLEMENTATION COMPLETE"; then
+            completion_status=0
+        elif echo "$result" | grep -q "IMPLEMENTATION BLOCKED"; then
+            completion_status=1
+        else
+            completion_status=2
+        fi
    fi
+
+    case $completion_status in
+        0)
+            # Extract decisions for decision log if available
+            if type get_result_decisions >/dev/null 2>&1 && type append_to_decision_log >/dev/null 2>&1; then
+                local decisions=$(get_result_decisions)
+                if [ "$decisions" != "[]" ] && [ -n "$decisions" ]; then
+                    append_to_decision_log "DEV" "$story_id" "Decisions: $decisions"
+                fi
+            fi
+            log_success "Dev phase complete: $story_id"
+            return 0
+            ;;
+        1)
+            log_error "Dev phase blocked: $story_id"
+            if type get_result_summary >/dev/null 2>&1; then
+                local summary=$(get_result_summary)
+                [ -n "$summary" ] && echo "Reason: $summary"
+            fi
+            echo "$result" | grep "IMPLEMENTATION BLOCKED" || true
+            return 1
+            ;;
+        *)
+            log_error "Dev phase did not complete cleanly: $story_id"
+            return 1
+            ;;
+    esac
 }

 # Global variable to store review findings for fix loop
@ -882,22 +967,45 @@ When the workflow presents options:
 ## Completion Signals

 When review passes (all HIGH/MEDIUM issues fixed, all ACs implemented, status set to 'done'):
-Output exactly: REVIEW PASSED: $story_id

-When review passes but required fixes:
-Output exactly: REVIEW PASSED WITH FIXES: $story_id - Fixed N issues
+1. Output a JSON result block:
+\`\`\`json
+{
+  \"status\": \"PASSED\",
+  \"story_id\": \"$story_id\",
+  \"summary\": \"<what was reviewed and any fixes made>\",
+  \"files_changed\": [\"<files modified during review>\"],
+  \"issues\": []
+}
+\`\`\`
+
+2. Then output exactly: REVIEW PASSED: $story_id
+   Or if fixes were made: REVIEW PASSED WITH FIXES: $story_id - Fixed N issues

 If review fails (unfixable issues, missing acceptance criteria that YOU cannot fix):
-1. First output a structured findings block:
+
+1. Output a JSON result block with issues:
+\`\`\`json
+{
+  \"status\": \"FAILED\",
+  \"story_id\": \"$story_id\",
+  \"summary\": \"<summary of why review failed>\",
+  \"issues\": [
+    {\"severity\": \"HIGH\", \"description\": \"<issue>\", \"location\": \"<file:line>\"},
+    {\"severity\": \"MEDIUM\", \"description\": \"<issue>\", \"location\": \"<file:line>\"}
+  ]
+}
+\`\`\`
+
+2. Then output the legacy findings block:
 \`\`\`
 REVIEW FINDINGS START
 - [HIGH] Description of issue 1 (file:line if applicable)
- [HIGH] Description of issue 2
- [MEDIUM] Description of issue 3
-... all HIGH and MEDIUM issues that need dev attention ...
+- [MEDIUM] Description of issue 2
 REVIEW FINDINGS END
 \`\`\`
-2. Then output exactly: REVIEW FAILED: $story_id - [summary reason]
+
+3. Then output exactly: REVIEW FAILED: $story_id - [summary reason]

 ## Begin Execution

@ -917,25 +1025,56 @@ Stage any fixes with: git add -A"

    echo "$result" >> "$LOG_FILE"

-    if echo "$result" | grep -q "REVIEW PASSED"; then
-        log_success "Review passed: $story_id"
-        return 0
-    elif echo "$result" | grep -q "REVIEW FAILED"; then
-        log_error "Review failed: $story_id"
-        echo "$result" | grep "REVIEW FAILED"
-
-        # Extract findings for fix loop
-        LAST_REVIEW_FINDINGS=$(echo "$result" | sed -n '/REVIEW FINDINGS START/,/REVIEW FINDINGS END/p' | grep -E '^\s*-\s*\[(HIGH|MEDIUM)\]' || true)
-
-        if [ -n "$LAST_REVIEW_FINDINGS" ]; then
-            log "Captured ${#LAST_REVIEW_FINDINGS} bytes of review findings for fix loop"
-        fi
-
-        return 1
+    # Check completion using JSON parsing with text fallback
+    local completion_status
+    if type check_phase_completion >/dev/null 2>&1; then
+        check_phase_completion "$result" "review" "$story_id"
+        completion_status=$?
    else
-        log_warn "Review did not complete cleanly: $story_id"
-        return 1
+        # Fallback to legacy detection
+        if echo "$result" | grep -q "REVIEW PASSED"; then
+            completion_status=0
+        elif echo "$result" | grep -q "REVIEW FAILED"; then
+            completion_status=1
+        else
+            completion_status=2
+        fi
    fi
+
+    case $completion_status in
+        0)
+            log_success "Review passed: $story_id"
+            return 0
+            ;;
+        1)
+            log_error "Review failed: $story_id"
+            echo "$result" | grep "REVIEW FAILED" || true
+
+            # Extract findings for fix loop - try JSON first, then legacy
+            if type get_result_issues >/dev/null 2>&1; then
+                local json_issues=$(get_result_issues)
+                if [ "$json_issues" != "[]" ] && [ -n "$json_issues" ]; then
+                    # Convert JSON issues to text format for fix phase
+                    LAST_REVIEW_FINDINGS=$(echo "$json_issues" | jq -r '.[] | "- [\(.severity)] \(.description) (\(.location // "unknown"))"' 2>/dev/null || echo "")
+                fi
+            fi
+
+            # Fallback to legacy text extraction if JSON didn't work
+            if [ -z "$LAST_REVIEW_FINDINGS" ]; then
+                LAST_REVIEW_FINDINGS=$(echo "$result" | sed -n '/REVIEW FINDINGS START/,/REVIEW FINDINGS END/p' | grep -E '^\s*-\s*\[(HIGH|MEDIUM)\]' || true)
+            fi
+
+            if [ -n "$LAST_REVIEW_FINDINGS" ]; then
+                log "Captured review findings for fix loop"
+            fi
+
+            return 1
+            ;;
+        *)
+            log_warn "Review did not complete cleanly: $story_id"
+            return 1
+            ;;
+    esac
 }

 execute_fix_phase() {
@ -1062,10 +1201,33 @@ $story_contents
 ## Completion Signals

 When ALL review issues are successfully fixed:
-Output exactly: FIX COMPLETE: $story_id - Fixed [N] issues
+
+1. Output a JSON result block:
+\`\`\`json
+{
+  \"status\": \"COMPLETE\",
+  \"story_id\": \"$story_id\",
+  \"summary\": \"Fixed N issues: <brief list>\",
+  \"files_changed\": [\"<files modified>\"],
+  \"issues\": []
+}
+\`\`\`
+
+2. Then output exactly: FIX COMPLETE: $story_id - Fixed [N] issues

 If unable to fix one or more issues:
-Output exactly: FIX INCOMPLETE: $story_id - [reason and which issues remain]
+
+1. Output a JSON result block with remaining issues:
+\`\`\`json
+{
+  \"status\": \"FAILED\",
+  \"story_id\": \"$story_id\",
+  \"summary\": \"<what was fixed and what remains>\",
+  \"issues\": [{\"severity\": \"HIGH\", \"description\": \"<remaining issue>\", \"location\": \"<file:line>\"}]
+}
+\`\`\`
+
+2. Then output exactly: FIX INCOMPLETE: $story_id - [reason and which issues remain]

 ## Begin Execution

@ -1082,20 +1244,40 @@ Address all review findings now. This is attempt $attempt_num of 3."

    echo "$result" >> "$LOG_FILE"

-    if echo "$result" | grep -q "FIX COMPLETE"; then
-        log_success "Fix phase complete: $story_id (attempt $attempt_num)"
-        record_fix_attempt "$story_id" "$attempt_num" "success"
-        return 0
-    elif echo "$result" | grep -q "FIX INCOMPLETE"; then
-        log_error "Fix phase incomplete: $story_id (attempt $attempt_num)"
-        echo "$result" | grep "FIX INCOMPLETE"
-        record_fix_attempt "$story_id" "$attempt_num" "failed"
-        return 1
+    # Check completion using JSON parsing with text fallback
+    local completion_status
+    if type check_phase_completion >/dev/null 2>&1; then
+        check_phase_completion "$result" "fix" "$story_id"
+        completion_status=$?
    else
-        log_warn "Fix phase did not complete cleanly: $story_id (attempt $attempt_num)"
-        record_fix_attempt "$story_id" "$attempt_num" "failed"
-        return 1
+        # Fallback to legacy detection
+        if echo "$result" | grep -q "FIX COMPLETE"; then
+            completion_status=0
+        elif echo "$result" | grep -q "FIX INCOMPLETE"; then
+            completion_status=1
+        else
+            completion_status=2
+        fi
    fi
+
+    case $completion_status in
+        0)
+            log_success "Fix phase complete: $story_id (attempt $attempt_num)"
+            record_fix_attempt "$story_id" "$attempt_num" "success"
+            return 0
+            ;;
+        1)
+            log_error "Fix phase incomplete: $story_id (attempt $attempt_num)"
+            echo "$result" | grep "FIX INCOMPLETE" || true
+            record_fix_attempt "$story_id" "$attempt_num" "failed"
+            return 1
+            ;;
+        *)
+            log_warn "Fix phase did not complete cleanly: $story_id (attempt $attempt_num)"
+            record_fix_attempt "$story_id" "$attempt_num" "failed"
+            return 1
+            ;;
+    esac
 }

 # =============================================================================
@ -1854,12 +2036,49 @@ execute_story_with_fix_loop() {
    # DESIGN PHASE (Context 0) - Pre-implementation planning
    if [ "$SKIP_DESIGN" = false ] && type execute_design_phase >/dev/null 2>&1; then
        if ! execute_design_phase "$story_file"; then
-            log_warn "Design phase did not complete cleanly for $story_id - proceeding to dev"
+            log_warn "Design phase did not complete cleanly for $story_id - proceeding"
            # Don't fail - design is advisory
        fi
    fi

-    # DEV PHASE (Context 1)
+    # TDD PHASES (Test-First Development)
+    # Enabled by default, skip with --skip-tdd or individual --skip-test-spec/--skip-test-impl
+    if [ "$SKIP_TDD" = false ]; then
+
+        # TEST SPEC PHASE - Generate test specifications from acceptance criteria
+        if [ "$SKIP_TEST_SPEC" = false ] && type execute_test_spec_phase >/dev/null 2>&1; then
+            if ! execute_test_spec_phase "$story_file"; then
+                log_warn "Test spec phase did not complete cleanly for $story_id - proceeding"
+                # Don't fail - spec generation is advisory
+            fi
+        fi
+
+        # TEST IMPL PHASE - Create failing tests from specifications
+        if [ "$SKIP_TEST_IMPL" = false ] && type execute_test_impl_phase >/dev/null 2>&1; then
+            # Only run if we have test specs (either just generated or loaded from file)
+            if [ -n "$LAST_TEST_SPEC" ] || [ -f "$TEST_SPEC_DIR/${story_id}-test-spec.md" ] 2>/dev/null; then
+                if ! execute_test_impl_phase "$story_file"; then
+                    log_warn "Test impl phase did not complete cleanly for $story_id - proceeding"
+                    # Don't fail - test impl is advisory in first iteration
+                fi
+            else
+                log_warn "No test specifications available - skipping test implementation"
+            fi
+        fi
+
+        # TEST VERIFICATION PHASE - Verify tests fail appropriately
+        if [ "$SKIP_TEST_IMPL" = false ] && type execute_test_verification_phase >/dev/null 2>&1; then
+            # Only verify if we just created tests
+            if type get_last_test_spec >/dev/null 2>&1 && [ -n "$(get_last_test_spec)" ]; then
+                if ! execute_test_verification_phase "$story_file"; then
+                    log_warn "Test verification had issues - proceeding to dev phase"
+                    # Don't fail - verification is informational
+                fi
+            fi
+        fi
+    fi
+
+    # DEV PHASE (Context 1) - Now implements to make tests pass (if TDD enabled)
    if ! execute_dev_phase "$story_file"; then
        log_error "Dev phase failed for $story_id"
        return 1