BMAD-METHOD/samples/sample-custom-modules/cc-agents-commands/commands/test-orchestrate.md

26 KiB

description argument-hint allowed-tools
Orchestrate test failure analysis and coordinate parallel specialist test fixers with strategic analysis mode [test_scope] [--run-first] [--coverage] [--fast] [--strategic] [--research] [--force-escalate] [--no-chain] [--api-only] [--database-only] [--vitest-only] [--pytest-only] [--playwright-only] [--only-category=<unit|integration|e2e|acceptance>]
Task
TodoWrite
Bash
Grep
Read
LS
Glob
SlashCommand

Test Orchestration Command (v2.0)

Execute this test orchestration procedure for: "$ARGUMENTS"


ORCHESTRATOR GUARD RAILS

PROHIBITED (NEVER do directly):

  • Direct edits to test files
  • Direct edits to source files
  • pytest --fix or similar
  • git add / git commit
  • pip install / uv add
  • Modifying test configuration

ALLOWED (delegation only):

  • Task(subagent_type="unit-test-fixer", ...)
  • Task(subagent_type="api-test-fixer", ...)
  • Task(subagent_type="database-test-fixer", ...)
  • Task(subagent_type="e2e-test-fixer", ...)
  • Task(subagent_type="type-error-fixer", ...)
  • Task(subagent_type="import-error-fixer", ...)
  • Read-only bash commands for analysis
  • Grep/Glob/Read for investigation

WHY: Ensures expert handling by specialists, prevents conflicts, maintains audit trail.


STEP 0: MODE DETECTION + AUTO-ESCALATION + DEPTH PROTECTION

0a. Depth Protection (prevent infinite loops)

echo "SLASH_DEPTH=${SLASH_DEPTH:-0}"

If SLASH_DEPTH >= 3:

  • Report: "Maximum orchestration depth (3) reached. Exiting to prevent loop."
  • EXIT immediately

Otherwise, set for any chained commands:

export SLASH_DEPTH=$((${SLASH_DEPTH:-0} + 1))

0b. Parse Strategic Flags

Check "$ARGUMENTS" for strategic triggers:

  • --strategic = Force strategic mode
  • --research = Research best practices only (no fixes)
  • --force-escalate = Force strategic mode regardless of history

If ANY strategic flag present → Set STRATEGIC_MODE=true

0c. Auto-Escalation Detection

Check git history for recurring test fix attempts:

TEST_FIX_COUNT=$(git log --oneline -20 | grep -iE "fix.*(test|spec|jest|pytest|vitest)" | wc -l | tr -d ' ')
echo "TEST_FIX_COUNT=$TEST_FIX_COUNT"

If TEST_FIX_COUNT >= 3:

  • Report: "Detected $TEST_FIX_COUNT test fix attempts in recent history. Auto-escalating to strategic mode."
  • Set STRATEGIC_MODE=true

0d. Mode Decision

Condition Mode
--strategic OR --research OR --force-escalate STRATEGIC
TEST_FIX_COUNT >= 3 STRATEGIC (auto-escalated)
Otherwise TACTICAL (default)

Report the mode: "Operating in [TACTICAL/STRATEGIC] mode."


STEP 1: Parse Arguments

Check "$ARGUMENTS" for these flags:

  • --run-first = Ignore cached results, run fresh tests
  • --pytest-only = Focus on pytest (backend) only
  • --vitest-only = Focus on Vitest (frontend) only
  • --playwright-only = Focus on Playwright (E2E) only
  • --coverage = Include coverage analysis
  • --fast = Skip slow tests
  • --no-chain = Disable chain invocation after fixes
  • --only-category=<category> = Target specific test category for faster iteration

Parse --only-category for targeted test execution:

# Parse --only-category for finer control
if [[ "$ARGUMENTS" =~ "--only-category="([a-zA-Z]+) ]]; then
    TARGET_CATEGORY="${BASH_REMATCH[1]}"
    echo "🎯 Targeting only '$TARGET_CATEGORY' tests"
    # Used in STEP 4 to filter pytest: -k $TARGET_CATEGORY
fi

Valid categories: unit, integration, e2e, acceptance, api, database


STEP 2: Discover Cached Test Results

Run these commands ONE AT A TIME:

2a. Project info:

echo "Project: $(basename $PWD) | Branch: $(git branch --show-current) | Root: $PWD"

2b. Check if pytest results exist:

test -f "test-results/pytest/junit.xml" && echo "PYTEST_EXISTS=yes" || echo "PYTEST_EXISTS=no"

2c. If pytest results exist, get stats:

echo "PYTEST_AGE=$(($(date +%s) - $(stat -f %m test-results/pytest/junit.xml 2>/dev/null || stat -c %Y test-results/pytest/junit.xml 2>/dev/null)))s"
echo "PYTEST_TESTS=$(grep -o 'tests="[0-9]*"' test-results/pytest/junit.xml | head -1 | grep -o '[0-9]*')"
echo "PYTEST_FAILURES=$(grep -o 'failures="[0-9]*"' test-results/pytest/junit.xml | head -1 | grep -o '[0-9]*')"

2d. Check Vitest results:

test -f "test-results/vitest/results.json" && echo "VITEST_EXISTS=yes" || echo "VITEST_EXISTS=no"

2e. Check Playwright results:

test -f "test-results/playwright/results.json" && echo "PLAYWRIGHT_EXISTS=yes" || echo "PLAYWRIGHT_EXISTS=no"

STEP 2.5: Test Framework Intelligence

Detect test framework configuration:

2.5a. Pytest configuration:

grep -A 20 "\[tool.pytest" pyproject.toml 2>/dev/null | head -25 || echo "No pytest config in pyproject.toml"

2.5b. Available pytest markers:

grep -rh "pytest.mark\." tests/ 2>/dev/null | sed 's/.*@pytest.mark.\([a-zA-Z_]*\).*/\1/' | sort -u | head -10

2.5c. Check for slow tests:

grep -l "@pytest.mark.slow" tests/**/*.py 2>/dev/null | wc -l | xargs echo "Slow tests:"

Save detected markers and configuration for agent context.


STEP 2.6: Discover Project Context (SHARED CACHE - Token Efficient)

Token Savings: Using shared discovery cache saves ~14K tokens (2K per agent x 7 agents).

# 📊 SHARED DISCOVERY - Use cached context, refresh if stale (>15 min)
echo "=== Loading Shared Project Context ==="

# Source shared discovery helper (creates/uses cache)
if [[ -f "$HOME/.claude/scripts/shared-discovery.sh" ]]; then
    source "$HOME/.claude/scripts/shared-discovery.sh"
    discover_project_context

    # SHARED_CONTEXT now contains pre-built context for agents
    # Variables available: PROJECT_TYPE, VALIDATION_CMD, TEST_FRAMEWORK, RULES_SUMMARY
else
    # Fallback: inline discovery (less efficient)
    echo "⚠️ Shared discovery not found, using inline discovery"

    PROJECT_CONTEXT=""
    [ -f "CLAUDE.md" ] && PROJECT_CONTEXT="Read CLAUDE.md for project conventions. "
    [ -d ".claude/rules" ] && PROJECT_CONTEXT+="Check .claude/rules/ for patterns. "

    PROJECT_TYPE=""
    [ -f "pyproject.toml" ] && PROJECT_TYPE="python"
    [ -f "package.json" ] && PROJECT_TYPE="${PROJECT_TYPE:+$PROJECT_TYPE+}node"

    SHARED_CONTEXT="$PROJECT_CONTEXT"
fi

# Display cached context summary
echo "PROJECT_TYPE=$PROJECT_TYPE"
echo "VALIDATION_CMD=${VALIDATION_CMD:-pnpm prepush}"
echo "TEST_FRAMEWORK=${TEST_FRAMEWORK:-pytest}"

CRITICAL: Pass $SHARED_CONTEXT to ALL agent prompts instead of asking each agent to discover. This prevents 7 agents from each running discovery independently.


STEP 3: Decision Logic + Early Exit

Based on discovery, decide:

Condition Action
--run-first flag present Go to STEP 4 (run fresh tests)
PYTEST_EXISTS=yes AND AGE < 900s AND FAILURES > 0 Go to STEP 5 (read results)
PYTEST_EXISTS=yes AND AGE < 900s AND FAILURES = 0 EARLY EXIT (see below)
PYTEST_EXISTS=no OR AGE >= 900s Go to STEP 4 (run fresh tests)

EARLY EXIT OPTIMIZATION (Token Savings: ~80%)

If ALL tests are passing from cached results:

✅ All tests passing (PYTEST_FAILURES=0, VITEST_FAILURES=0)
📊 No failures to fix. Skipping agent dispatch.
💰 Token savings: ~80K tokens (avoided 7 agent dispatches)

Output JSON summary:
{
  "status": "all_passing",
  "tests_run": $PYTEST_TESTS,
  "failures": 0,
  "agents_dispatched": 0,
  "action": "none_required"
}

→ Go to STEP 10 (chain invocation) or EXIT if --no-chain

DO NOT:

  • Run discovery phase (STEP 2.6) if no failures
  • Dispatch any agents
  • Run strategic analysis
  • Generate documentation

This avoids full pipeline when unnecessary.


STEP 4: Run Fresh Tests (if needed)

4a. Run pytest:

mkdir -p test-results/pytest && cd apps/api && uv run pytest -v --tb=short --junitxml=../../test-results/pytest/junit.xml 2>&1 | tail -40

4b. Run Vitest (if config exists):

test -f "apps/web/vitest.config.ts" && mkdir -p test-results/vitest && cd apps/web && npx vitest run --reporter=json --outputFile=../../test-results/vitest/results.json 2>&1 | tail -25

4c. Run Playwright (if config exists):

test -f "playwright.config.ts" && mkdir -p test-results/playwright && npx playwright test --reporter=json 2>&1 | tee test-results/playwright/results.json | tail -25

4d. If --coverage flag present:

mkdir -p test-results/pytest && cd apps/api && uv run pytest --cov=app --cov-report=xml:../../test-results/pytest/coverage.xml --cov-report=term-missing 2>&1 | tail -30

STEP 5: Read Test Result Files

Use the Read tool:

For pytest: Read(file_path="test-results/pytest/junit.xml")

  • Look for <testcase> with <failure> or <error> children
  • Extract: test name, classname (file path), failure message, full stack trace

For Vitest: Read(file_path="test-results/vitest/results.json")

  • Look for "status": "failed" entries
  • Extract: test name, file path, failure messages

For Playwright: Read(file_path="test-results/playwright/results.json")

  • Look for specs where "ok": false
  • Extract: test title, browser, error message

STEP 5.5: ANALYSIS PHASE

5.5a. Test Isolation Analysis

Check for potential isolation issues:

echo "=== Shared State Detection ===" && grep -rn "global\|class.*:$" tests/ 2>/dev/null | grep -v "conftest\|__pycache__" | head -10
echo "=== Fixture Scope Analysis ===" && grep -rn "@pytest.fixture.*scope=" tests/ 2>/dev/null | head -10
echo "=== Order Dependency Markers ===" && grep -rn "pytest.mark.order\|pytest.mark.serial" tests/ 2>/dev/null | head -5

If isolation issues detected:

  • Add to agent context: "WARNING: Potential test isolation issues detected"
  • List affected files

5.5b. Flakiness Detection

Check for flaky test indicators:

echo "=== Timing Dependencies ===" && grep -rn "sleep\|time.sleep\|setTimeout" tests/ 2>/dev/null | grep -v "__pycache__" | head -5
echo "=== Async Race Conditions ===" && grep -rn "asyncio.gather\|Promise.all" tests/ 2>/dev/null | head -5

If flakiness indicators found:

  • Add to agent context: "Known flaky patterns detected"
  • Recommend: pytest-rerunfailures or vitest retry

5.5c. Coverage Analysis (if --coverage)

test -f "test-results/pytest/coverage.xml" && grep -o 'line-rate="[0-9.]*"' test-results/pytest/coverage.xml | head -1

Coverage gates:

  • < 60%: WARN "Critical: Coverage below 60%"
  • 60-80%: INFO "Coverage could be improved"
  • 80%: OK


STEP 6: Enhanced Failure Categorization (Regex-Based)

Use regex pattern matching for precise categorization:

Unit Test Patterns → unit-test-fixer

  • /AssertionError:.*expected.*got/ → Assertion mismatch
  • /Mock.*call_count.*expected/ → Mock verification failure
  • /fixture.*not found/ → Fixture missing
  • Business logic failures

API Test Patterns → api-test-fixer

  • /status.*(4\d\d|5\d\d)/ → HTTP error response
  • /validation.*failed|ValidationError/ → Schema validation
  • /timeout.*\d+\s*(s|ms)/ → Request timeout
  • FastAPI/Flask/Django endpoint failures

Database Test Patterns → database-test-fixer

  • /connection.*refused|ConnectionError/ → Connection failure
  • /relation.*does not exist|table.*not found/ → Schema mismatch
  • /deadlock.*detected/ → Concurrency issue
  • /IntegrityError|UniqueViolation/ → Constraint violation
  • Fixture/mock database issues

E2E Test Patterns → e2e-test-fixer

  • /locator.*timeout|element.*not found/ → Selector failure
  • /navigation.*failed|page.*crashed/ → Page load issue
  • /screenshot.*captured/ → Visual regression
  • Playwright/Cypress failures

Type Error Patterns → type-error-fixer

  • /TypeError:.*expected.*got/ → Type mismatch
  • /mypy.*error/ → Static type check failure
  • /TypeScript.*error TS/ → TS compilation error

Import Error Patterns → import-error-fixer

  • /ModuleNotFoundError|ImportError/ → Missing module
  • /circular import/ → Circular dependency
  • /cannot import name/ → Named import failure

STEP 6.5: FAILURE PRIORITIZATION

Assign priority based on test type:

Priority Criteria Detection
P0 Critical Security/auth tests test_auth_*, test_security_*, test_permission_*
P1 High Core business logic test_*_service, test_*_handler, most unit tests
P2 Medium Integration tests test_*_integration, API tests
P3 Low Edge cases, performance test_*_edge_*, test_*_perf_*, test_*_slow

Pass priority information to agents:

  • "Priority: P0 - Fix these FIRST (security critical)"
  • "Priority: P1 - High importance (core logic)"

STEP 7: STRATEGIC MODE (if triggered)

If STRATEGIC_MODE=true:

7a. Launch Test Strategy Analyst

Task(subagent_type="test-strategy-analyst",
     model="opus",
     description="Analyze recurring test failures",
     prompt="Analyze test failures in this project using Five Whys methodology.

Git history shows $TEST_FIX_COUNT recent test fix attempts.
Current failures: [FAILURE SUMMARY]

Research:
1. Best practices for the detected failure patterns
2. Common pitfalls in pytest/vitest testing
3. Root cause analysis for recurring issues

Provide strategic recommendations for systemic fixes.

MANDATORY OUTPUT FORMAT - Return ONLY JSON:
{
  \"root_causes\": [{\"issue\": \"...\", \"five_whys\": [...], \"recommendation\": \"...\"}],
  \"infrastructure_changes\": [\"...\"],
  \"prevention_mechanisms\": [\"...\"],
  \"priority\": \"P0|P1|P2\",
  \"summary\": \"Brief strategic overview\"
}
DO NOT include verbose analysis or full code examples.")

7b. After Strategy Analyst Completes

If fixes are recommended, proceed to STEP 8.

7c. Launch Documentation Generator (optional)

If significant insights were found:

Task(subagent_type="test-documentation-generator",
     model="haiku",
     description="Generate test knowledge documentation",
     prompt="Based on the strategic analysis results, generate:
1. Test failure runbook (docs/test-failure-runbook.md)
2. Test strategy summary (docs/test-strategy.md)
3. Pattern-specific knowledge (docs/test-knowledge/)

MANDATORY OUTPUT FORMAT - Return ONLY JSON:
{
  \"files_created\": [\"docs/test-failure-runbook.md\"],
  \"patterns_documented\": 3,
  \"summary\": \"Created runbook with 5 failure patterns\"
}
DO NOT include file contents in response.")

STEP 7.5: Conflict Detection for Parallel Agents

Before launching agents, detect overlapping file scopes to prevent conflicts:

SAFE TO PARALLELIZE (different test domains):

  • unit-test-fixer + e2e-test-fixer → Different test directories
  • api-test-fixer + database-test-fixer → Different concerns
  • vitest tests + pytest tests → Different frameworks

MUST SERIALIZE (overlapping files):

  • unit-test-fixer + import-error-fixer → ⚠️ Both may modify conftest.py → SEQUENTIAL
  • type-error-fixer + any test fixer → ⚠️ Type fixes affect test expectations → RUN FIRST
  • Multiple fixers for same test file → ⚠️ RUN SEQUENTIALLY

Execution Phases:

PHASE 1 (First): type-error-fixer, import-error-fixer
   └── These fix foundational issues that other agents depend on

PHASE 2 (Parallel): unit-test-fixer, api-test-fixer, database-test-fixer
   └── These target different test categories, safe to run together

PHASE 3 (Last): e2e-test-fixer
   └── E2E depends on backend fixes being complete

PHASE 4 (Validation): Run full test suite to verify all fixes

Conflict Detection Algorithm:

# Check if multiple agents target same file patterns
# If conftest.py in scope of multiple agents → serialize them
# If same test file reported → assign to single agent only

STEP 7.6: Test File Modification Safety (NEW)

CRITICAL: When multiple test files need modification, apply dependency-aware batching similar to source file refactoring.

Analyze Test File Dependencies

Before spawning test fixers, identify shared fixtures and conftest dependencies:

echo "=== Test Dependency Analysis ==="

# Find all conftest.py files
CONFTEST_FILES=$(find tests/ -name "conftest.py" 2>/dev/null)
echo "Shared fixture files: $CONFTEST_FILES"

# For each failing test file, find its fixture dependencies
for TEST_FILE in $FAILING_TEST_FILES; do
    # Find imports from conftest
    FIXTURE_IMPORTS=$(grep -E "^from.*conftest|@pytest.fixture" "$TEST_FILE" 2>/dev/null | head -10)

    # Find shared fixtures used
    FIXTURES_USED=$(grep -oE "[a-z_]+_fixture|@pytest.fixture" "$TEST_FILE" 2>/dev/null | sort -u)

    echo "  $TEST_FILE -> fixtures: [$FIXTURES_USED]"
done

Group Test Files by Shared Fixtures

# Files sharing conftest.py fixtures MUST serialize
# Files with independent fixtures CAN parallelize

# Example output:
echo "
Test Cluster A (SERIAL - shared fixtures in tests/conftest.py):
  - tests/unit/test_user.py
  - tests/unit/test_auth.py

Test Cluster B (PARALLEL - independent fixtures):
  - tests/integration/test_api.py
  - tests/integration/test_database.py

Test Cluster C (SPECIAL - conftest modification needed):
  - tests/conftest.py (SERIALIZE - blocks all others)
"

Execution Rules for Test Modifications

Scenario Execution Mode Reason
Multiple test files, no shared fixtures PARALLEL Safe, independent
Multiple test files, shared fixtures SERIAL within fixture scope Fixture state conflicts
conftest.py needs modification SERIAL (blocks all) Critical shared state
Same test file reported by multiple fixers Single agent only Avoid merge conflicts

conftest.py Special Handling

If conftest.py needs modification:

  1. Run conftest fixer FIRST (before any other test fixers)
  2. Wait for completion before proceeding
  3. Re-run baseline tests to verify fixture changes don't break existing tests
  4. Then parallelize remaining independent test fixes
PHASE 1 (First, blocking): conftest.py modification
   └── WAIT for completion

PHASE 2 (Sequential): Test files sharing modified fixtures
   └── Run one at a time, verify after each

PHASE 3 (Parallel): Independent test files
   └── Safe to parallelize

Failure Handling for Test Modifications

When a test fixer fails:

AskUserQuestion(
  questions=[{
    "question": "Test fixer for {test_file} failed: {error}. {N} test files remain. What would you like to do?",
    "header": "Test Fix Failure",
    "options": [
      {"label": "Continue", "description": "Skip this test file, proceed with remaining"},
      {"label": "Abort", "description": "Stop test fixing, preserve current state"},
      {"label": "Retry", "description": "Attempt to fix {test_file} again"}
    ],
    "multiSelect": false
  }]
)

Test Fixer Dispatch with Scope

Include scope information when dispatching test fixers:

Task(
    subagent_type="unit-test-fixer",
    description="Fix unit tests in {test_file}",
    prompt="Fix failing tests in this file:

    TEST FILE CONTEXT:
    - file: {test_file}
    - shared_fixtures: {list of conftest fixtures used}
    - parallel_peers: {other test files being fixed simultaneously}
    - conftest_modified: {true|false - was conftest changed this session?}

    SCOPE CONSTRAINTS:
    - ONLY modify: {test_file}
    - DO NOT modify: conftest.py (unless explicitly assigned)
    - DO NOT modify: {parallel_peer_files}

    MANDATORY OUTPUT FORMAT - Return ONLY JSON:
    {
      \"status\": \"fixed|partial|failed\",
      \"test_file\": \"{test_file}\",
      \"tests_fixed\": N,
      \"fixtures_modified\": [],
      \"remaining_failures\": N,
      \"summary\": \"...\"
    }"
)

STEP 8: PARALLEL AGENT DISPATCH

CRITICAL: Launch ALL agents in ONE response with multiple Task calls.

ENHANCED AGENT CONTEXT TEMPLATE

For each agent, provide this comprehensive context:

Test Specialist Task: [Agent Type] - Test Failure Fix

## Context
- Project: [detected from git remote]
- Branch: [from git branch --show-current]
- Framework: pytest [version] / vitest [version]
- Python/Node version: [detected]

## Project Patterns (DISCOVER DYNAMICALLY - Do This First!)
**CRITICAL - Project Context Discovery:**
Before making any fixes, you MUST:
1. Read CLAUDE.md at project root (if exists) for project conventions
2. Check .claude/rules/ directory for domain-specific rule files:
   - If editing Python test files → read python*.md rules
   - If editing TypeScript tests → read typescript*.md rules
   - If graphiti/temporal patterns exist → read graphiti.md rules
3. Detect test patterns from config files (pytest.ini, vitest.config.ts)
4. Apply discovered patterns to ALL your fixes

This ensures fixes follow project conventions, not generic patterns.

[Include PROJECT_CONTEXT from STEP 2.6 here]

## Recent Test Changes
[git diff HEAD~3 --name-only | grep -E "(test|spec)\.(py|ts|tsx)$"]

## Failures to Fix
[FAILURE LIST with full stack traces]

## Test Isolation Status
[From STEP 5.5a - any warnings]

## Flakiness Report
[From STEP 5.5b - any detected patterns]

## Priority
[From STEP 6.5 - P0/P1/P2/P3 with reasoning]

## Framework Configuration
[From STEP 2.5 - markers, config]

## Constraints
- Follow project's test method length limits (check CLAUDE.md or file-size-guidelines.md)
- Pre-flight: Verify baseline tests pass
- Post-flight: Ensure no broken existing tests
- Cannot modify implementation code (test expectations only unless bug found)
- Apply project-specific patterns discovered from CLAUDE.md/.claude/rules/

## Expected Output
- Summary of fixes made
- Files modified with line numbers
- Verification commands run
- Remaining issues (if any)

Dispatch Example (with Model Strategy + JSON Output)

Task(subagent_type="unit-test-fixer",
     model="sonnet",
     description="Fix unit test failures (P1)",
     prompt="[FULL ENHANCED CONTEXT TEMPLATE]

MANDATORY OUTPUT FORMAT - Return ONLY JSON:
{
  \"status\": \"fixed|partial|failed\",
  \"tests_fixed\": N,
  \"files_modified\": [\"path/to/file.py\"],
  \"remaining_failures\": N,
  \"summary\": \"Brief description of fixes\"
}
DO NOT include full file content or verbose logs.")

Task(subagent_type="api-test-fixer",
     model="sonnet",
     description="Fix API test failures (P2)",
     prompt="[FULL ENHANCED CONTEXT TEMPLATE]

MANDATORY OUTPUT FORMAT - Return ONLY JSON:
{...same format...}
DO NOT include full file content or verbose logs.")

Task(subagent_type="import-error-fixer",
     model="haiku",
     description="Fix import errors (P1)",
     prompt="[CONTEXT]

MANDATORY OUTPUT FORMAT - Return ONLY JSON:
{...same format...}")

Model Strategy

Agent Type Model Rationale
test-strategy-analyst opus Complex research + Five Whys
unit/api/database/e2e-test-fixer sonnet Balanced speed + quality
type-error-fixer sonnet Type inference complexity
import-error-fixer haiku Simple pattern matching
linting-fixer haiku Rule-based fixes
test-documentation-generator haiku Template-based docs

STEP 9: Validate Fixes

After agents complete:

cd apps/api && uv run pytest -v --tb=short --junitxml=../../test-results/pytest/junit.xml 2>&1 | tail -40

Check results:

  • If ALL tests pass → Go to STEP 10
  • If SOME tests still fail → Report remaining failures, suggest --strategic

STEP 10: INTELLIGENT CHAIN INVOCATION

10a. Check Depth

If SLASH_DEPTH >= 3:

  • Report: "Maximum depth reached, skipping chain invocation"
  • Go to STEP 11

10b. Check --no-chain Flag

If --no-chain present:

  • Report: "Chain invocation disabled by flag"
  • Go to STEP 11

10c. Determine Chain Action

If ALL tests passing AND changes were made:

SlashCommand(skill="/commit_orchestrate",
             args="--message 'fix(tests): resolve test failures'")

If ALL tests passing AND NO changes made:

  • Report: "All tests passing, no changes needed"
  • Go to STEP 11

If SOME tests still failing:

  • Report remaining failure count
  • If TACTICAL mode: Suggest "Run with --strategic for root cause analysis"
  • Go to STEP 11

STEP 11: Report Summary

Report:

  • Mode: TACTICAL or STRATEGIC
  • Initial failure count by type
  • Agents dispatched with priorities
  • Strategic insights (if applicable)
  • Current pass/fail status
  • Coverage status (if --coverage)
  • Chain invocation result
  • Remaining issues and recommendations

Quick Reference

Command Effect
/test_orchestrate Use cached results if fresh (<15 min)
/test_orchestrate --run-first Run tests fresh, ignore cache
/test_orchestrate --pytest-only Only pytest failures
/test_orchestrate --strategic Force strategic mode (research + analysis)
/test_orchestrate --coverage Include coverage analysis
/test_orchestrate --no-chain Don't auto-invoke /commit_orchestrate

VS Code Integration

pytest.ini must have: addopts = --junitxml=test-results/pytest/junit.xml

Then: Run tests in VS Code -> /test_orchestrate reads cached results -> Fixes applied


Agent Quick Reference

Failure Pattern Agent Model JSON Output
Assertions, mocks, fixtures unit-test-fixer sonnet Required
HTTP, API contracts, endpoints api-test-fixer sonnet Required
Database, SQL, connections database-test-fixer sonnet Required
Selectors, timeouts, E2E e2e-test-fixer sonnet Required
Type annotations, mypy type-error-fixer sonnet Required
Imports, modules, paths import-error-fixer haiku Required
Strategic analysis test-strategy-analyst opus Required
Documentation test-documentation-generator haiku Required

Token Efficiency: JSON Output Format

ALL agents MUST return distilled JSON summaries only.

{
  "status": "fixed|partial|failed",
  "tests_fixed": 3,
  "files_modified": ["tests/test_auth.py", "tests/conftest.py"],
  "remaining_failures": 0,
  "summary": "Fixed mock configuration and assertion order"
}

DO NOT return:

  • Full file contents
  • Verbose explanations
  • Step-by-step execution logs

This reduces token usage by 80-90% per agent response.


EXECUTE NOW. Start with Step 0a (depth check).