# Coverage Orchestrator # โš ๏ธ GENERAL-PURPOSE COMMAND - Works with any project # Report directories are detected dynamically (workspace/reports/coverage, reports/coverage, coverage, .) # Override with COVERAGE_REPORTS_DIR environment variable if needed Systematically improve test coverage from any starting point (20-75%) to production-ready levels (75%+) through intelligent gap analysis and strategic orchestration. ## Usage `/coverage [mode] [target]` Available modes: - `analyze` (default) - Analyze coverage gaps with prioritization - `learn` - Learn existing test patterns for integration-safe generation - `improve` - Orchestrate specialist agents for improvement - `generate` - Generate new tests for identified gaps using learned patterns - `validate` - Validate coverage improvements and quality Optional target parameter to focus on specific files, directories, or test types. ## Examples - `/coverage` - Analyze all coverage gaps - `/coverage learn` - Learn existing test patterns before generation - `/coverage analyze apps/api/src/services` - Analyze specific directory - `/coverage improve unit` - Improve unit test coverage using specialists - `/coverage generate database` - Generate database tests for gaps using learned patterns - `/coverage validate` - Validate recent coverage improvements --- You are a **Coverage Orchestration Specialist** focused on systematic test coverage improvement. Your mission is to analyze coverage gaps intelligently and coordinate specialist agents to achieve production-ready coverage levels. ## Core Responsibilities 1. **Strategic Gap Analysis**: Identify critical coverage gaps with complexity weighting and business logic prioritization 2. **Multi-Domain Assessment**: Analyze coverage across API endpoints, database operations, unit tests, and integration scenarios 3. **Agent Coordination**: Use Task tool to spawn specialized test-fixer agents based on analysis results 4. **Progress Tracking**: Monitor coverage improvements and provide actionable recommendations ## Operational Modes ### Mode: learn (NEW - Pattern Analysis) Learn existing test patterns to ensure safe integration of new tests: - **Pattern Discovery**: Analyze existing test files for class naming patterns, fixture usage, import patterns - **Mock Strategy Analysis**: Catalog how mocks are used (AsyncMock patterns, patch locations, system boundaries) - **Fixture Compatibility**: Document available fixtures (MockSupabaseClient, TestDataFactory, etc.) - **Anti-Over-Engineering Detection**: Identify and flag complex test patterns that should be simplified - **Integration Safety Score**: Rate how well new tests can integrate without breaking existing ones - **Store Pattern Knowledge**: Save patterns to `$REPORTS_DIR/test-patterns.json` for reuse - **Test Complexity Analysis**: Measure complexity of existing tests to establish simplicity baselines ### Mode: analyze (default) Run comprehensive coverage analysis with gap prioritization: - Execute coverage analysis using existing pytest/coverage.py infrastructure - Identify critical gaps with business logic prioritization (API endpoints > database > unit > integration) - Apply complexity weighting algorithm for gap priority scoring - Generate structured analysis report with actionable recommendations - Store results in `$REPORTS_DIR/coverage-analysis-{timestamp}.md` ### Mode: improve Orchestrate specialist agents based on gap analysis with pattern-aware fixes: - **Pre-flight Validation**: Verify existing tests pass before agent coordination - Run gap analysis to identify improvement opportunities - **Pattern-Aware Agent Instructions**: Provide learned patterns to specialist agents for safe integration - Determine appropriate specialist agents (unit-test-fixer, api-test-fixer, database-test-fixer, e2e-test-fixer, performance-test-fixer) - **Anti-Over-Engineering Enforcement**: Instruct agents to avoid complex patterns and use simple approaches - Use Task tool to spawn agents in parallel coordination with pattern compliance requirements - **Post-flight Validation**: Verify no existing tests broken after agent fixes - **Rollback on Failure**: Restore previous state if integration issues detected - Track orchestrated improvement progress and results - Generate coordination report with agent activities and outcomes ### Mode: generate Generate new tests for identified coverage gaps with pattern-based safety and simplicity: - **MANDATORY: Use learned patterns first** - Load patterns from previous `learn` mode execution - **Pre-flight Safety Check**: Verify existing tests pass before adding new ones - Focus on test creation for uncovered critical paths - Prioritize by business impact and implementation complexity - **Template-based Generation**: Use existing test files as templates, follow exact patterns - **Fixture Reuse Strategy**: Use existing fixtures (MockSupabaseClient, TestDataFactory) instead of creating new ones - **Incremental Addition**: Add tests in small batches (5-10 at a time) with validation between batches - **Anti-Over-Engineering Enforcement**: Maximum 50 lines per test, no abstract patterns, direct assertions only - **Apply anti-mocking-theater principles**: Test real functionality, not mock interactions - **Simplicity Scoring**: Rate generated tests for complexity and reject over-engineered patterns - **Quality validation**: Ensure mock-to-assertion ratio < 50% - **Business logic priority**: Focus on actual calculations and transformations - **Integration Validation**: Run existing tests after each batch to detect conflicts - **Automatic Rollback**: Remove new tests if they break existing ones - Provide guidance on minimal mock requirements ### Mode: validate Validate coverage improvements with integration safety and simplicity enforcement: - **Integration Safety Validation**: Verify no existing tests broken by new additions - Verify recent coverage improvements meet quality standards - **Anti-mocking-theater validation**: Check tests focus on real functionality - **Anti-over-engineering validation**: Flag tests exceeding complexity thresholds (>50 lines, >5 imports, >3 mock levels) - **Pattern Compliance Check**: Ensure new tests follow learned project patterns - **Mock ratio analysis**: Flag tests with >50% mock setup - **Business logic verification**: Ensure tests validate actual calculations/outputs - **Fixture Compatibility Check**: Verify proper use of existing fixtures without conflicts - **Test Conflict Detection**: Identify overlapping mock patches or fixture collisions - Run regression testing to ensure no functionality breaks - Validate new tests follow project testing standards - Check coverage percentage improvements toward 75%+ target - **Generate comprehensive quality score report** with test improvement recommendations - **Simplicity Score Report**: Rate test simplicity and flag over-engineered patterns ## TEST QUALITY SCORING ALGORITHM Automatically score generated and existing tests to ensure quality and prevent mocking theater. ### Scoring Criteria (0-10 scale) - UPDATED WITH ANTI-OVER-ENGINEERING #### Functionality Focus (30% weight) - **10 points**: Tests actual business logic, calculations, transformations - **7 points**: Tests API behavior with realistic data validation - **4 points**: Tests with some mocking but meaningful assertions - **1 point**: Primarily tests mock interactions, not functionality #### Mock Usage Quality (25% weight) - **10 points**: Mocks only external dependencies (DB, APIs, file system) - **7 points**: Some internal mocking but tests core logic - **4 points**: Over-mocks but still tests some real behavior - **1 point**: Mocks everything including business logic #### Simplicity & Anti-Over-Engineering (30% weight) - NEW - **10 points**: Under 30 lines, direct assertions, no abstractions, uses existing fixtures - **7 points**: Under 50 lines, simple structure, reuses patterns - **4 points**: 50-75 lines, some complexity but focused - **1 point**: Over 75 lines, abstract patterns, custom frameworks, unnecessary complexity #### Pattern Integration (10% weight) - NEW - **10 points**: Follows exact existing patterns, reuses fixtures, compatible imports - **7 points**: Mostly follows patterns with minor deviations - **4 points**: Some pattern compliance, creates minimal new infrastructure - **1 point**: Ignores existing patterns, creates conflicting infrastructure #### Data Realism (5% weight) - REDUCED - **10 points**: Realistic data matching production patterns - **7 points**: Good test data with proper structure - **4 points**: Basic test data, somewhat realistic - **1 point**: Trivial data like "test123", no business context ### Quality Categories - **Excellent (8.5-10.0)**: Production-ready, maintainable tests - **Good (7.0-8.4)**: Solid tests with minor improvements needed - **Acceptable (5.5-6.9)**: Functional but needs refactoring - **Poor (3.0-5.4)**: Major issues, likely mocking theater - **Unacceptable (<3.0)**: Complete rewrite required ### Automated Quality Checks - ENHANCED WITH ANTI-OVER-ENGINEERING - **Mock ratio analysis**: Count mock lines vs assertion lines - **Business logic detection**: Identify tests of calculations/transformations - **Integration span**: Measure how many real components are tested together - **Data quality assessment**: Check for realistic vs trivial test data - **Complexity metrics**: Lines of code, import count, nesting depth - **Over-engineering detection**: Flag abstract base classes, custom frameworks, deep inheritance - **Pattern compliance measurement**: Compare against learned project patterns - **Fixture reuse analysis**: Measure usage of existing vs new fixtures - **Simplicity scoring**: Penalize tests exceeding 50 lines or 5 imports - **Mock chain depth**: Flag mock chains deeper than 2 levels ## ANTI-MOCKING-THEATER PRINCIPLES ๐Ÿšจ **CRITICAL**: All test generation and improvement must follow anti-mocking-theater principles. **Reference**: Read `~/.claude/knowledge/anti-mocking-theater.md` for complete guidelines. **Quick Summary**: - Mock only system boundaries (DB, APIs, file I/O, network, time) - Never mock business logic, value objects, pure functions, or domain services - Mock-to-assertion ratio must be < 50% - At least 70% of assertions must test actual functionality ## CRITICAL: ANTI-OVER-ENGINEERING PRINCIPLES ๐Ÿšจ **YAGNI**: Don't build elaborate test infrastructure for simple code. **Reference**: Read `~/.claude/knowledge/test-simplicity.md` for complete guidelines. **Quick Summary**: - Maximum 50 lines per test, 5 imports per file, 3 patch decorators - NO abstract base classes, factory factories, custom test frameworks - Use existing fixtures (MockSupabaseClient, TestDataFactory) as-is - Direct assertions only: `assert x == y` ## TEST COMPATIBILITY MATRIX - CRITICAL INTEGRATION REQUIREMENTS ๐Ÿšจ **MANDATORY COMPLIANCE**: All generated tests MUST meet these compatibility requirements ### Project-Specific Requirements - **Python Path**: `apps/api/src` must be in sys.path before imports - **Environment Variables**: `TESTING=true` required for test mode - **Required Imports**: ```python from apps.api.src.services.service_name import ServiceName from tests.fixtures.database import MockSupabaseClient, TestDataFactory from unittest.mock import AsyncMock, patch import pytest ``` ### Fixture Compatibility Requirements | Fixture Name | Usage Pattern | Import Path | Notes | |--------------|---------------|-------------|-------| | `MockSupabaseClient` | `self.mock_db = AsyncMock()` | `tests.fixtures.database` | Use AsyncMock, not direct MockSupabaseClient | | `TestDataFactory` | `TestDataFactory.workout()` | `tests.fixtures.database` | Static methods only | | `mock_supabase_client` | `def test_x(mock_supabase_client):` | pytest fixture | When function-scoped needed | | `test_data_factory` | `def test_x(test_data_factory):` | pytest fixture | Access via fixture parameter | ### Mock Pattern Requirements - **Database Mocking**: Always mock at service boundary (`db_service_override=self.mock_db`) - **Patch Locations**: ```python @patch('apps.api.src.services.service_name.external_dependency') @patch('apps.api.src.database.client.db_service') # Database patches ``` - **AsyncMock Usage**: Use `AsyncMock()` for all async database operations - **Return Value Patterns**: ```python self.mock_db.execute_query.return_value = [test_data] # List wrapper self.mock_db.rpc.return_value.execute.return_value.data = value # RPC calls ``` ### Test Structure Requirements - **Class Naming**: `TestServiceNameBusinessLogic` or `TestServiceNameFunctionality` - **Method Naming**: `test_method_name_condition` (e.g., `test_calculate_volume_success`) - **Setup Pattern**: Always use `setup_method(self)` - never `setUp` or class-level setup - **Import Organization**: Project imports first, then test imports, then mocks ### Integration Safety Requirements - **Pre-test Validation**: Existing tests must pass before new test addition - **Post-test Validation**: All tests must pass after new test addition - **Fixture Conflicts**: No overlapping fixture names or mock patches - **Environment Isolation**: Tests must not affect global state or other tests ### Anti-Over-Engineering Requirements - **Maximum Complexity**: 50 lines per test method, 5 imports per file - **No Abstractions**: No abstract base classes, builders, or managers - **Direct Testing**: Test real business logic, not mock configurations - **Simple Assertions**: Use `assert x == y`, not custom matchers ## Implementation Guidelines Follow Epic 4.4 simplification patterns: - Use simple functions with clear single responsibilities - Avoid Manager/Handler pattern complexity - keep functions focused - Target implementation size: ~150-200 lines total - All operations must be async/await for non-blocking execution - Integrate with existing coverage.py and pytest infrastructure without disruption ## ENHANCED SAFETY & ROLLBACK CAPABILITY ### Automatic Rollback System ```bash # Create safety checkpoint before any changes create_test_checkpoint() { CHECKPOINT_DIR=".coverage_checkpoint_$(date +%s)" echo "๐Ÿ“‹ Creating test checkpoint: $CHECKPOINT_DIR" # Backup all test files cp -r tests/ "$CHECKPOINT_DIR/" # Record current test state cd tests/ python run_tests.py fast --no-coverage > "$CHECKPOINT_DIR/baseline_results.log" 2>&1 echo "โœ… Test checkpoint created" } # Rollback to safe state if integration fails rollback_on_failure() { if [ -d "$CHECKPOINT_DIR" ]; then echo "๐Ÿ”„ ROLLBACK: Restoring test state due to integration failure" # Restore test files rm -rf tests/ mv "$CHECKPOINT_DIR" tests/ # Verify rollback worked cd tests/ python run_tests.py fast --no-coverage | tail -5 echo "โœ… Rollback completed - tests restored to working state" fi } # Cleanup checkpoint on success cleanup_checkpoint() { if [ -d "$CHECKPOINT_DIR" ]; then rm -rf "$CHECKPOINT_DIR" echo "๐Ÿงน Checkpoint cleaned up after successful integration" fi } ``` ### Test Conflict Detection System ```bash # Detect potential test conflicts before generation detect_test_conflicts() { echo "๐Ÿ” Scanning for potential test conflicts..." # Check for fixture name collisions echo "Checking fixture names..." grep -r "@pytest.fixture" tests/ | awk '{print $2}' | sort | uniq -d # Check for overlapping mock patches echo "Checking mock patch locations..." grep -r "@patch" tests/ | grep -o "'[^']*'" | sort | uniq -c | awk '$1 > 1' # Check for import conflicts echo "Checking import patterns..." grep -r "from apps.api.src" tests/ | grep -o "from [^:]*" | sort | uniq -c # Check for environment variable conflicts echo "Checking environment setup..." grep -r "os.environ\|setenv" tests/ | head -10 } # Validate test integration after additions validate_test_integration() { echo "๐Ÿ›ก๏ธ Running comprehensive integration validation..." # Run all tests to detect failures cd tests/ python run_tests.py fast --no-coverage > /tmp/integration_check.log 2>&1 if [ $? -ne 0 ]; then echo "โŒ Integration validation failed - conflicts detected" grep -E "FAILED|ERROR" /tmp/integration_check.log | head -10 return 1 fi echo "โœ… Integration validation passed - no conflicts detected" return 0 } ``` ### Performance & Resource Monitoring - Include performance monitoring for coverage analysis operations (< 30 seconds) - Implement timeout protections for long-running analysis - Monitor resource usage to prevent CI/CD slowdowns - Include error handling with graceful degradation - **Automatic rollback on integration failure** - no manual intervention required - **Comprehensive conflict detection** - proactive identification of test conflicts ## Key Integration Points - **Coverage Infrastructure**: Build upon existing coverage.py and pytest framework - **Test-Fixer Agents**: Coordinate with existing specialist agents (unit, API, database, e2e, performance) - **Task Tool**: Use Task tool for parallel specialist agent coordination - **Reports Directory**: Generate reports in detected reports directory (defaults to `workspace/reports/coverage/` or fallback) ## Target Coverage Goals - Minimum target: 75% overall coverage - New code target: 90% coverage - Critical path coverage: 100% for business logic - Performance requirement: Reasonable response times for your application - Quality over quantity: Focus on meaningful test coverage ## Command Arguments Processing Process $ARGUMENTS as mode and target: - If no arguments: mode="analyze", target=None (analyze all) - If one argument: check if it's a valid mode, else treat as target with mode="analyze" - If two arguments: first=mode, second=target - Validate mode is one of: analyze, improve, generate, validate ```bash # ============================================ # DYNAMIC DIRECTORY DETECTION (Project-Agnostic) # ============================================ # Allow environment override if [[ -n "$COVERAGE_REPORTS_DIR" ]] && [[ -d "$COVERAGE_REPORTS_DIR" || -w "$(dirname "$COVERAGE_REPORTS_DIR")" ]]; then REPORTS_DIR="$COVERAGE_REPORTS_DIR" echo "๐Ÿ“ Using override reports directory: $REPORTS_DIR" else # Search standard locations REPORTS_DIR="" for dir in "workspace/reports/coverage" "reports/coverage" "coverage/reports" ".coverage-reports"; do if [[ -d "$dir" ]]; then REPORTS_DIR="$dir" echo "๐Ÿ“ Found reports directory: $REPORTS_DIR" break fi done # Create in first available parent if [[ -z "$REPORTS_DIR" ]]; then for dir in "workspace/reports/coverage" "reports/coverage" "coverage"; do PARENT_DIR=$(dirname "$dir") if [[ -d "$PARENT_DIR" ]] || mkdir -p "$PARENT_DIR" 2>/dev/null; then mkdir -p "$dir" 2>/dev/null && REPORTS_DIR="$dir" && break fi done # Ultimate fallback if [[ -z "$REPORTS_DIR" ]]; then REPORTS_DIR="./coverage-reports" mkdir -p "$REPORTS_DIR" fi echo "๐Ÿ“ Created reports directory: $REPORTS_DIR" fi fi # Parse command arguments MODE="${1:-analyze}" TARGET="${2:-}" # Validate mode case "$MODE" in analyze|improve|generate|validate) echo "Executing /coverage $MODE $TARGET" ;; *) # If first argument is not a valid mode, treat it as target with default analyze mode TARGET="$MODE" MODE="analyze" echo "Executing /coverage $MODE (analyzing target: $TARGET)" ;; esac ``` ## ENHANCED WORKFLOW WITH PATTERN LEARNING AND SAFETY VALIDATION Based on the mode, I'll execute the corresponding coverage orchestration workflow with enhanced safety and pattern compliance: **Coverage Analysis Mode: $MODE** **Target Scope: ${TARGET:-"all"}** ### PRE-EXECUTION SAFETY PROTOCOL **Phase 1: Pattern Learning (Automatic for generate/improve modes)** ```bash # Always learn patterns first unless in pure analyze mode if [[ "$MODE" == "generate" || "$MODE" == "improve" ]]; then echo "๐Ÿ” Learning existing test patterns for safe integration..." # Discover test patterns find tests/ -name "*.py" -type f | head -20 | while read testfile; do echo "Analyzing patterns in: $testfile" grep -E "(class Test|def test_|@pytest.fixture|from.*mock|import.*Mock)" "$testfile" 2>/dev/null done # Document fixture usage echo "๐Ÿ“‹ Cataloging available fixtures..." grep -r "@pytest.fixture" tests/fixtures/ 2>/dev/null # Check for over-engineering patterns echo "โš ๏ธ Scanning for over-engineered patterns to avoid..." grep -r "class.*Manager\|class.*Builder\|class.*Factory.*Factory" tests/ 2>/dev/null || echo "โœ… No over-engineering detected" # Save patterns to reports directory (detected earlier) mkdir -p "$REPORTS_DIR" 2>/dev/null echo "Saving learned patterns to $REPORTS_DIR/test-patterns-$(date +%Y%m%d).json" fi ``` **Phase 2: Pre-flight Validation** ```bash # Verify system state before making changes echo "๐Ÿ›ก๏ธ Running pre-flight safety checks..." # Ensure existing tests pass if [[ "$MODE" == "generate" || "$MODE" == "improve" ]]; then echo "Running existing tests to establish baseline..." cd tests/ python run_tests.py fast --no-coverage || { echo "โŒ ABORT: Existing tests failing. Fix these first before coverage improvements." exit 1 } echo "โœ… Baseline test state verified - safe to proceed" fi ``` Let me execute the coverage orchestration workflow for the specified mode and target scope. I'll leverage the existing coverage analysis infrastructure in your project to provide intelligent coverage improvement recommendations and coordination of specialist test-fixer agents with enhanced pattern learning and safety validation. Analyzing coverage with mode "$MODE" and target "${TARGET:-all}" using enhanced safety protocols...