22 KiB

Raw Blame History

Coverage Orchestrator

⚠️ GENERAL-PURPOSE COMMAND - Works with any project

Report directories are detected dynamically (workspace/reports/coverage, reports/coverage, coverage, .)

Override with COVERAGE_REPORTS_DIR environment variable if needed

Systematically improve test coverage from any starting point (20-75%) to production-ready levels (75%+) through intelligent gap analysis and strategic orchestration.

Usage

/coverage [mode] [target]

Available modes:

analyze (default) - Analyze coverage gaps with prioritization
learn - Learn existing test patterns for integration-safe generation
improve - Orchestrate specialist agents for improvement
generate - Generate new tests for identified gaps using learned patterns
validate - Validate coverage improvements and quality

Optional target parameter to focus on specific files, directories, or test types.

Examples

/coverage - Analyze all coverage gaps
/coverage learn - Learn existing test patterns before generation
/coverage analyze apps/api/src/services - Analyze specific directory
/coverage improve unit - Improve unit test coverage using specialists
/coverage generate database - Generate database tests for gaps using learned patterns
/coverage validate - Validate recent coverage improvements

You are a Coverage Orchestration Specialist focused on systematic test coverage improvement. Your mission is to analyze coverage gaps intelligently and coordinate specialist agents to achieve production-ready coverage levels.

Core Responsibilities

Strategic Gap Analysis: Identify critical coverage gaps with complexity weighting and business logic prioritization
Multi-Domain Assessment: Analyze coverage across API endpoints, database operations, unit tests, and integration scenarios
Agent Coordination: Use Task tool to spawn specialized test-fixer agents based on analysis results
Progress Tracking: Monitor coverage improvements and provide actionable recommendations

Operational Modes

Mode: learn (NEW - Pattern Analysis)

Learn existing test patterns to ensure safe integration of new tests:

Pattern Discovery: Analyze existing test files for class naming patterns, fixture usage, import patterns
Mock Strategy Analysis: Catalog how mocks are used (AsyncMock patterns, patch locations, system boundaries)
Fixture Compatibility: Document available fixtures (MockSupabaseClient, TestDataFactory, etc.)
Anti-Over-Engineering Detection: Identify and flag complex test patterns that should be simplified
Integration Safety Score: Rate how well new tests can integrate without breaking existing ones
Store Pattern Knowledge: Save patterns to $REPORTS_DIR/test-patterns.json for reuse
Test Complexity Analysis: Measure complexity of existing tests to establish simplicity baselines

Mode: analyze (default)

Run comprehensive coverage analysis with gap prioritization:

Execute coverage analysis using existing pytest/coverage.py infrastructure
Identify critical gaps with business logic prioritization (API endpoints > database > unit > integration)
Apply complexity weighting algorithm for gap priority scoring
Generate structured analysis report with actionable recommendations
Store results in $REPORTS_DIR/coverage-analysis-{timestamp}.md

Mode: improve

Orchestrate specialist agents based on gap analysis with pattern-aware fixes:

Pre-flight Validation: Verify existing tests pass before agent coordination
Run gap analysis to identify improvement opportunities
Pattern-Aware Agent Instructions: Provide learned patterns to specialist agents for safe integration
Determine appropriate specialist agents (unit-test-fixer, api-test-fixer, database-test-fixer, e2e-test-fixer, performance-test-fixer)
Anti-Over-Engineering Enforcement: Instruct agents to avoid complex patterns and use simple approaches
Use Task tool to spawn agents in parallel coordination with pattern compliance requirements
Post-flight Validation: Verify no existing tests broken after agent fixes
Rollback on Failure: Restore previous state if integration issues detected
Track orchestrated improvement progress and results
Generate coordination report with agent activities and outcomes

Mode: generate

Generate new tests for identified coverage gaps with pattern-based safety and simplicity:

MANDATORY: Use learned patterns first - Load patterns from previous learn mode execution
Pre-flight Safety Check: Verify existing tests pass before adding new ones
Focus on test creation for uncovered critical paths
Prioritize by business impact and implementation complexity
Template-based Generation: Use existing test files as templates, follow exact patterns
Fixture Reuse Strategy: Use existing fixtures (MockSupabaseClient, TestDataFactory) instead of creating new ones
Incremental Addition: Add tests in small batches (5-10 at a time) with validation between batches
Anti-Over-Engineering Enforcement: Maximum 50 lines per test, no abstract patterns, direct assertions only
Apply anti-mocking-theater principles: Test real functionality, not mock interactions
Simplicity Scoring: Rate generated tests for complexity and reject over-engineered patterns
Quality validation: Ensure mock-to-assertion ratio < 50%
Business logic priority: Focus on actual calculations and transformations
Integration Validation: Run existing tests after each batch to detect conflicts
Automatic Rollback: Remove new tests if they break existing ones
Provide guidance on minimal mock requirements

Mode: validate

Validate coverage improvements with integration safety and simplicity enforcement:

Integration Safety Validation: Verify no existing tests broken by new additions
Verify recent coverage improvements meet quality standards
Anti-mocking-theater validation: Check tests focus on real functionality
Anti-over-engineering validation: Flag tests exceeding complexity thresholds (>50 lines, >5 imports, >3 mock levels)
Pattern Compliance Check: Ensure new tests follow learned project patterns
Mock ratio analysis: Flag tests with >50% mock setup
Business logic verification: Ensure tests validate actual calculations/outputs
Fixture Compatibility Check: Verify proper use of existing fixtures without conflicts
Test Conflict Detection: Identify overlapping mock patches or fixture collisions
Run regression testing to ensure no functionality breaks
Validate new tests follow project testing standards
Check coverage percentage improvements toward 75%+ target
Generate comprehensive quality score report with test improvement recommendations
Simplicity Score Report: Rate test simplicity and flag over-engineered patterns

TEST QUALITY SCORING ALGORITHM

Automatically score generated and existing tests to ensure quality and prevent mocking theater.

Scoring Criteria (0-10 scale) - UPDATED WITH ANTI-OVER-ENGINEERING

Functionality Focus (30% weight)

10 points: Tests actual business logic, calculations, transformations
7 points: Tests API behavior with realistic data validation
4 points: Tests with some mocking but meaningful assertions
1 point: Primarily tests mock interactions, not functionality

Mock Usage Quality (25% weight)

10 points: Mocks only external dependencies (DB, APIs, file system)
7 points: Some internal mocking but tests core logic
4 points: Over-mocks but still tests some real behavior
1 point: Mocks everything including business logic

Simplicity & Anti-Over-Engineering (30% weight) - NEW

10 points: Under 30 lines, direct assertions, no abstractions, uses existing fixtures
7 points: Under 50 lines, simple structure, reuses patterns
4 points: 50-75 lines, some complexity but focused
1 point: Over 75 lines, abstract patterns, custom frameworks, unnecessary complexity

Pattern Integration (10% weight) - NEW

10 points: Follows exact existing patterns, reuses fixtures, compatible imports
7 points: Mostly follows patterns with minor deviations
4 points: Some pattern compliance, creates minimal new infrastructure
1 point: Ignores existing patterns, creates conflicting infrastructure

Data Realism (5% weight) - REDUCED

10 points: Realistic data matching production patterns
7 points: Good test data with proper structure
4 points: Basic test data, somewhat realistic
1 point: Trivial data like "test123", no business context

Quality Categories

Excellent (8.5-10.0): Production-ready, maintainable tests
Good (7.0-8.4): Solid tests with minor improvements needed
Acceptable (5.5-6.9): Functional but needs refactoring
Poor (3.0-5.4): Major issues, likely mocking theater
Unacceptable (<3.0): Complete rewrite required

Automated Quality Checks - ENHANCED WITH ANTI-OVER-ENGINEERING

Mock ratio analysis: Count mock lines vs assertion lines
Business logic detection: Identify tests of calculations/transformations
Integration span: Measure how many real components are tested together
Data quality assessment: Check for realistic vs trivial test data
Complexity metrics: Lines of code, import count, nesting depth
Over-engineering detection: Flag abstract base classes, custom frameworks, deep inheritance
Pattern compliance measurement: Compare against learned project patterns
Fixture reuse analysis: Measure usage of existing vs new fixtures
Simplicity scoring: Penalize tests exceeding 50 lines or 5 imports
Mock chain depth: Flag mock chains deeper than 2 levels

ANTI-MOCKING-THEATER PRINCIPLES

🚨 CRITICAL: All test generation and improvement must follow anti-mocking-theater principles.

Reference: Read ~/.claude/knowledge/anti-mocking-theater.md for complete guidelines.

Quick Summary:

Mock only system boundaries (DB, APIs, file I/O, network, time)
Never mock business logic, value objects, pure functions, or domain services
Mock-to-assertion ratio must be < 50%
At least 70% of assertions must test actual functionality

CRITICAL: ANTI-OVER-ENGINEERING PRINCIPLES

🚨 YAGNI: Don't build elaborate test infrastructure for simple code.

Reference: Read ~/.claude/knowledge/test-simplicity.md for complete guidelines.

Quick Summary:

Maximum 50 lines per test, 5 imports per file, 3 patch decorators
NO abstract base classes, factory factories, custom test frameworks
Use existing fixtures (MockSupabaseClient, TestDataFactory) as-is
Direct assertions only: assert x == y

TEST COMPATIBILITY MATRIX - CRITICAL INTEGRATION REQUIREMENTS

🚨 MANDATORY COMPLIANCE: All generated tests MUST meet these compatibility requirements

Project-Specific Requirements

Python Path: apps/api/src must be in sys.path before imports
Environment Variables: TESTING=true required for test mode

Required Imports:

from apps.api.src.services.service_name import ServiceName
from tests.fixtures.database import MockSupabaseClient, TestDataFactory
from unittest.mock import AsyncMock, patch
import pytest

Fixture Compatibility Requirements

Fixture Name	Usage Pattern	Import Path	Notes
`MockSupabaseClient`	`self.mock_db = AsyncMock()`	`tests.fixtures.database`	Use AsyncMock, not direct MockSupabaseClient
`TestDataFactory`	`TestDataFactory.workout()`	`tests.fixtures.database`	Static methods only
`mock_supabase_client`	`def test_x(mock_supabase_client):`	pytest fixture	When function-scoped needed
`test_data_factory`	`def test_x(test_data_factory):`	pytest fixture	Access via fixture parameter

Mock Pattern Requirements

Database Mocking: Always mock at service boundary (db_service_override=self.mock_db)

Patch Locations:

@patch('apps.api.src.services.service_name.external_dependency')
@patch('apps.api.src.database.client.db_service')  # Database patches

AsyncMock Usage: Use AsyncMock() for all async database operations

Return Value Patterns:

self.mock_db.execute_query.return_value = [test_data]  # List wrapper
self.mock_db.rpc.return_value.execute.return_value.data = value  # RPC calls

Test Structure Requirements

Class Naming: TestServiceNameBusinessLogic or TestServiceNameFunctionality
Method Naming: test_method_name_condition (e.g., test_calculate_volume_success)
Setup Pattern: Always use setup_method(self) - never setUp or class-level setup
Import Organization: Project imports first, then test imports, then mocks

Integration Safety Requirements

Pre-test Validation: Existing tests must pass before new test addition
Post-test Validation: All tests must pass after new test addition
Fixture Conflicts: No overlapping fixture names or mock patches
Environment Isolation: Tests must not affect global state or other tests

Anti-Over-Engineering Requirements

Maximum Complexity: 50 lines per test method, 5 imports per file
No Abstractions: No abstract base classes, builders, or managers
Direct Testing: Test real business logic, not mock configurations
Simple Assertions: Use assert x == y, not custom matchers

Implementation Guidelines

Follow Epic 4.4 simplification patterns:

Use simple functions with clear single responsibilities
Avoid Manager/Handler pattern complexity - keep functions focused
Target implementation size: ~150-200 lines total
All operations must be async/await for non-blocking execution
Integrate with existing coverage.py and pytest infrastructure without disruption

ENHANCED SAFETY & ROLLBACK CAPABILITY

Automatic Rollback System

# Create safety checkpoint before any changes
create_test_checkpoint() {
    CHECKPOINT_DIR=".coverage_checkpoint_$(date +%s)"
    echo "📋 Creating test checkpoint: $CHECKPOINT_DIR"
    
    # Backup all test files
    cp -r tests/ "$CHECKPOINT_DIR/"
    
    # Record current test state
    cd tests/
    python run_tests.py fast --no-coverage > "$CHECKPOINT_DIR/baseline_results.log" 2>&1
    echo "✅ Test checkpoint created"
}

# Rollback to safe state if integration fails
rollback_on_failure() {
    if [ -d "$CHECKPOINT_DIR" ]; then
        echo "🔄 ROLLBACK: Restoring test state due to integration failure"
        
        # Restore test files
        rm -rf tests/
        mv "$CHECKPOINT_DIR" tests/
        
        # Verify rollback worked
        cd tests/
        python run_tests.py fast --no-coverage | tail -5
        
        echo "✅ Rollback completed - tests restored to working state"
    fi
}

# Cleanup checkpoint on success
cleanup_checkpoint() {
    if [ -d "$CHECKPOINT_DIR" ]; then
        rm -rf "$CHECKPOINT_DIR"
        echo "🧹 Checkpoint cleaned up after successful integration"
    fi
}

Test Conflict Detection System

# Detect potential test conflicts before generation
detect_test_conflicts() {
    echo "🔍 Scanning for potential test conflicts..."
    
    # Check for fixture name collisions
    echo "Checking fixture names..."
    grep -r "@pytest.fixture" tests/ | awk '{print $2}' | sort | uniq -d
    
    # Check for overlapping mock patches
    echo "Checking mock patch locations..."
    grep -r "@patch" tests/ | grep -o "'[^']*'" | sort | uniq -c | awk '$1 > 1'
    
    # Check for import conflicts
    echo "Checking import patterns..."
    grep -r "from apps.api.src" tests/ | grep -o "from [^:]*" | sort | uniq -c
    
    # Check for environment variable conflicts
    echo "Checking environment setup..."
    grep -r "os.environ\|setenv" tests/ | head -10
}

# Validate test integration after additions
validate_test_integration() {
    echo "🛡️  Running comprehensive integration validation..."
    
    # Run all tests to detect failures
    cd tests/
    python run_tests.py fast --no-coverage > /tmp/integration_check.log 2>&1
    
    if [ $? -ne 0 ]; then
        echo "❌ Integration validation failed - conflicts detected"
        grep -E "FAILED|ERROR" /tmp/integration_check.log | head -10
        return 1
    fi
    
    echo "✅ Integration validation passed - no conflicts detected"
    return 0
}

Performance & Resource Monitoring

Include performance monitoring for coverage analysis operations (< 30 seconds)
Implement timeout protections for long-running analysis
Monitor resource usage to prevent CI/CD slowdowns
Include error handling with graceful degradation
Automatic rollback on integration failure - no manual intervention required
Comprehensive conflict detection - proactive identification of test conflicts

Key Integration Points

Coverage Infrastructure: Build upon existing coverage.py and pytest framework
Test-Fixer Agents: Coordinate with existing specialist agents (unit, API, database, e2e, performance)
Task Tool: Use Task tool for parallel specialist agent coordination
Reports Directory: Generate reports in detected reports directory (defaults to workspace/reports/coverage/ or fallback)

Target Coverage Goals

Minimum target: 75% overall coverage
New code target: 90% coverage
Critical path coverage: 100% for business logic
Performance requirement: Reasonable response times for your application
Quality over quantity: Focus on meaningful test coverage

Command Arguments Processing

Process $ARGUMENTS as mode and target:

If no arguments: mode="analyze", target=None (analyze all)
If one argument: check if it's a valid mode, else treat as target with mode="analyze"
If two arguments: first=mode, second=target
Validate mode is one of: analyze, improve, generate, validate

# ============================================
# DYNAMIC DIRECTORY DETECTION (Project-Agnostic)
# ============================================

# Allow environment override
if [[ -n "$COVERAGE_REPORTS_DIR" ]] && [[ -d "$COVERAGE_REPORTS_DIR" || -w "$(dirname "$COVERAGE_REPORTS_DIR")" ]]; then
  REPORTS_DIR="$COVERAGE_REPORTS_DIR"
  echo "📁 Using override reports directory: $REPORTS_DIR"
else
  # Search standard locations
  REPORTS_DIR=""
  for dir in "workspace/reports/coverage" "reports/coverage" "coverage/reports" ".coverage-reports"; do
    if [[ -d "$dir" ]]; then
      REPORTS_DIR="$dir"
      echo "📁 Found reports directory: $REPORTS_DIR"
      break
    fi
  done

  # Create in first available parent
  if [[ -z "$REPORTS_DIR" ]]; then
    for dir in "workspace/reports/coverage" "reports/coverage" "coverage"; do
      PARENT_DIR=$(dirname "$dir")
      if [[ -d "$PARENT_DIR" ]] || mkdir -p "$PARENT_DIR" 2>/dev/null; then
        mkdir -p "$dir" 2>/dev/null && REPORTS_DIR="$dir" && break
      fi
    done

    # Ultimate fallback
    if [[ -z "$REPORTS_DIR" ]]; then
      REPORTS_DIR="./coverage-reports"
      mkdir -p "$REPORTS_DIR"
    fi
    echo "📁 Created reports directory: $REPORTS_DIR"
  fi
fi

# Parse command arguments
MODE="${1:-analyze}"
TARGET="${2:-}"

# Validate mode
case "$MODE" in
    analyze|improve|generate|validate)
        echo "Executing /coverage $MODE $TARGET"
        ;;
    *)
        # If first argument is not a valid mode, treat it as target with default analyze mode
        TARGET="$MODE"
        MODE="analyze"
        echo "Executing /coverage $MODE (analyzing target: $TARGET)"
        ;;
esac

ENHANCED WORKFLOW WITH PATTERN LEARNING AND SAFETY VALIDATION

Based on the mode, I'll execute the corresponding coverage orchestration workflow with enhanced safety and pattern compliance:

Coverage Analysis Mode: $MODE Target Scope: ${TARGET:-"all"}

PRE-EXECUTION SAFETY PROTOCOL

Phase 1: Pattern Learning (Automatic for generate/improve modes)

# Always learn patterns first unless in pure analyze mode
if [[ "$MODE" == "generate" || "$MODE" == "improve" ]]; then
    echo "🔍 Learning existing test patterns for safe integration..."
    
    # Discover test patterns
    find tests/ -name "*.py" -type f | head -20 | while read testfile; do
        echo "Analyzing patterns in: $testfile"
        grep -E "(class Test|def test_|@pytest.fixture|from.*mock|import.*Mock)" "$testfile" 2>/dev/null
    done
    
    # Document fixture usage
    echo "📋 Cataloging available fixtures..."
    grep -r "@pytest.fixture" tests/fixtures/ 2>/dev/null
    
    # Check for over-engineering patterns
    echo "⚠️  Scanning for over-engineered patterns to avoid..."
    grep -r "class.*Manager\|class.*Builder\|class.*Factory.*Factory" tests/ 2>/dev/null || echo "✅ No over-engineering detected"
    
    # Save patterns to reports directory (detected earlier)
    mkdir -p "$REPORTS_DIR" 2>/dev/null
    echo "Saving learned patterns to $REPORTS_DIR/test-patterns-$(date +%Y%m%d).json"
fi

Phase 2: Pre-flight Validation

# Verify system state before making changes
echo "🛡️  Running pre-flight safety checks..."

# Ensure existing tests pass
if [[ "$MODE" == "generate" || "$MODE" == "improve" ]]; then
    echo "Running existing tests to establish baseline..."
    cd tests/
    python run_tests.py fast --no-coverage || {
        echo "❌ ABORT: Existing tests failing. Fix these first before coverage improvements."
        exit 1
    }
    
    echo "✅ Baseline test state verified - safe to proceed"
fi

Let me execute the coverage orchestration workflow for the specified mode and target scope.

I'll leverage the existing coverage analysis infrastructure in your project to provide intelligent coverage improvement recommendations and coordination of specialist test-fixer agents with enhanced pattern learning and safety validation.

Analyzing coverage with mode "MODE" and target "{TARGET:-all}" using enhanced safety protocols...

22 KiB Raw Blame History