21 KiB

Raw Blame History

Error Prevention System

Mistake Tracking and Prevention for Claude Code

The Error Prevention System enables Claude Code to learn from past mistakes and proactively prevent similar errors, creating a self-improving development environment that gets safer over time.

Error Catalog and Learning Framework

Comprehensive Error Documentation

error_entry:
  identification:
    id: "{uuid}"
    timestamp: "2024-01-15T14:30:00Z"
    severity: "critical|high|medium|low"
    category: "security|performance|logic|integration|deployment"
    error_signature: "unique_fingerprint_for_similar_errors"
    
  error_details:
    description: "Database connection pool exhaustion causing 503 errors"
    symptoms: 
      - "HTTP 503 Service Unavailable responses"
      - "Database connection timeout errors in logs"
      - "Application hanging on database queries"
      - "Memory usage steadily increasing"
    impact:
      - user_experience: "Complete service unavailability"
      - business_impact: "Revenue loss during downtime"
      - technical_debt: "Required emergency hotfix"
      - team_impact: "Weekend emergency response required"
    affected_components: 
      - "Database connection pool"
      - "API endpoints"
      - "User authentication service"
      - "Payment processing"
      
  context_information:
    project_phase: "production"
    technology_stack: ["nodejs", "postgresql", "docker", "kubernetes"]
    project_characteristics:
      size: "large"
      complexity: "high"
      team_size: "8"
      load_profile: "high_traffic"
    environmental_factors:
      - "Black Friday traffic spike"
      - "Recent deployment of new features"
      - "Database maintenance window completed day before"
    claude_code_context:
      files_involved: ["src/database/pool.js", "config/database.js"]
      tools_used_before_error: ["Edit", "Bash", "Write"]
      recent_changes: ["Increased connection timeout", "Added retry logic"]
      
  root_cause_analysis:
    immediate_cause: "Connection pool size insufficient for traffic spike"
    contributing_factors:
      - "Default pool size never adjusted for production load"
      - "No connection pool monitoring in place"
      - "Load testing didn't simulate realistic user behavior"
      - "Connection leak in error handling paths"
    root_cause: "Inadequate capacity planning and monitoring for database connections"
    analysis_method: "5 whys analysis + performance profiling"
    investigation_tools: ["APM traces", "Database logs", "Container metrics"]
    
  prevention_strategy:
    detection_rules:
      - rule: "Monitor connection pool utilization"
        trigger: "when pool_utilization > 80%"
        action: "Alert DevOps team immediately"
        automation_possible: true
        
      - rule: "Watch for connection timeout patterns"
        trigger: "when connection_timeouts > 5 in 1 minute"
        action: "Scale pool size automatically"
        automation_possible: true
        
      - rule: "Track connection pool growth rate"
        trigger: "when pool_size increases > 20% in 5 minutes"
        action: "Check for connection leaks"
        automation_possible: false
        
    prevention_steps:
      - step: "Implement connection pool monitoring"
        when: "during development phase"
        responsibility: "platform-engineer"
        tools_involved: ["monitoring setup", "alerting configuration"]
        effort_estimate: "4 hours"
        
      - step: "Add connection pool size auto-scaling"
        when: "before production deployment"
        responsibility: "dev"
        tools_involved: ["database configuration", "scaling logic"]
        effort_estimate: "8 hours"
        
      - step: "Implement proper connection cleanup"
        when: "during code review"
        responsibility: "dev"
        tools_involved: ["code review", "static analysis"]
        effort_estimate: "2 hours"
        
    validation_checks:
      - check: "Load test with connection pool monitoring"
        automation: "ci_cd_pipeline"
        frequency: "before_each_production_deployment"
        
      - check: "Review database connection usage patterns"
        automation: "static_analysis_tool"
        frequency: "with_each_code_change"
        
      - check: "Validate connection cleanup in error paths"
        automation: "integration_tests"
        frequency: "continuous"
        
  recovery_procedures:
    immediate_response:
      - "Scale database connection pool size"
      - "Restart application instances to clear stale connections"
      - "Enable database connection throttling"
      - "Redirect traffic to secondary regions if available"
      
    short_term_fixes:
      - "Implement connection pool monitoring dashboard"
      - "Add automated scaling for connection pool"
      - "Fix connection leaks in error handling"
      
    long_term_improvements:
      - "Implement comprehensive database capacity planning"
      - "Add chaos engineering tests for database failures"
      - "Create runbooks for database scaling scenarios"
      
  lessons_learned:
    - "Connection pool sizing must account for traffic spikes"
    - "Monitoring is essential for database resource management"
    - "Load testing scenarios should include realistic user patterns"
    - "Error handling paths need careful connection management"
    - "Automated scaling can prevent manual intervention delays"

Proactive Error Detection for Claude Code

Claude Code Tool Integration for Error Prevention

async def prevent_errors_in_claude_operations(operation_type, operation_context):
    """
    Prevent errors before Claude Code tool execution
    """
    # Get operation-specific error patterns
    relevant_errors = await get_relevant_error_patterns(
        operation_type,
        operation_context
    )
    
    error_prevention_result = {
        'operation_safe': True,
        'warnings': [],
        'preventive_actions': [],
        'risk_factors': []
    }
    
    # Analyze each relevant error pattern
    for error_pattern in relevant_errors:
        risk_assessment = assess_error_risk(
            error_pattern,
            operation_context
        )
        
        if risk_assessment.risk_level > 0.3:  # 30% risk threshold
            error_prevention_result['operation_safe'] = False
            error_prevention_result['warnings'].append({
                'error_type': error_pattern['category'],
                'description': error_pattern['description'],
                'risk_level': risk_assessment.risk_level,
                'similar_past_cases': risk_assessment.similar_cases
            })
            
            # Generate preventive actions
            preventive_actions = generate_preventive_actions(
                error_pattern,
                operation_context
            )
            error_prevention_result['preventive_actions'].extend(preventive_actions)
    
    return error_prevention_result

async def error_aware_file_edit(file_path, edit_content, current_context):
    """
    Edit files with error prevention based on historical patterns
    """
    # Pre-edit error analysis
    edit_risks = await analyze_edit_risks(file_path, edit_content, current_context)
    
    if edit_risks.has_high_risk_patterns:
        # Present warnings and suggest safer alternatives
        risk_warnings = []
        
        for risk in edit_risks.high_risk_patterns:
            warning = {
                'risk_type': risk.pattern_type,
                'description': risk.description,
                'historical_failures': risk.past_failures,
                'suggested_alternatives': risk.safer_alternatives
            }
            risk_warnings.append(warning)
        
        # Get user confirmation or apply safer alternatives
        prevention_response = await handle_edit_risk_warnings(
            risk_warnings,
            file_path,
            edit_content
        )
        
        if prevention_response.action == 'cancel':
            return {'status': 'cancelled', 'reason': 'high_risk_prevented'}
        elif prevention_response.action == 'modify':
            edit_content = prevention_response.safer_content
    
    # Execute edit with monitoring
    edit_result = await claude_code_edit(file_path, edit_content)
    
    # Post-edit validation
    post_edit_validation = await validate_edit_success(
        file_path,
        edit_content,
        edit_result,
        edit_risks
    )
    
    # Learn from edit outcome
    await learn_from_edit_outcome(
        file_path,
        edit_content,
        edit_result,
        post_edit_validation,
        current_context
    )
    
    return {
        'edit_result': edit_result,
        'risk_prevention': edit_risks,
        'validation': post_edit_validation
    }

async def error_aware_bash_execution(command, current_context):
    """
    Execute bash commands with error prevention
    """
    # Analyze command for known dangerous patterns
    command_risks = await analyze_command_risks(command, current_context)
    
    if command_risks.has_dangerous_patterns:
        # Check against error history
        similar_failures = await find_similar_command_failures(
            command,
            current_context
        )
        
        if similar_failures:
            # Provide warnings and safer alternatives
            safety_recommendations = generate_command_safety_recommendations(
                command,
                similar_failures,
                current_context
            )
            
            safer_command = await suggest_safer_command_alternative(
                command,
                safety_recommendations
            )
            
            if safer_command:
                command = safer_command
    
    # Execute with error monitoring
    execution_start = datetime.utcnow()
    
    try:
        result = await claude_code_bash(command)
        execution_duration = (datetime.utcnow() - execution_start).total_seconds()
        
        # Learn from successful execution
        await record_successful_command_execution(
            command,
            result,
            execution_duration,
            current_context
        )
        
        return result
        
    except Exception as e:
        execution_duration = (datetime.utcnow() - execution_start).total_seconds()
        
        # Learn from failed execution
        await record_failed_command_execution(
            command,
            str(e),
            execution_duration,
            current_context
        )
        
        # Try to provide recovery suggestions
        recovery_suggestions = await generate_recovery_suggestions(
            command,
            str(e),
            current_context
        )
        
        raise Exception(f"Command failed: {str(e)}\nRecovery suggestions: {recovery_suggestions}")

Pattern-Based Error Prevention

Automatic Error Pattern Detection

async def detect_error_patterns_in_codebase(project_path):
    """
    Detect potential error patterns in codebase using Claude Code tools
    """
    # Use Glob to find all relevant files
    code_files = await claude_code_glob("**/*.{js,ts,py,java,go,rb}")
    
    detected_patterns = {
        'high_risk': [],
        'medium_risk': [],
        'low_risk': []
    }
    
    # Load known error patterns
    error_patterns = await load_error_pattern_library()
    
    # Analyze each file for error patterns
    for file_path in code_files:
        file_content = await claude_code_read(file_path)
        
        for pattern in error_patterns:
            # Use Grep to find pattern matches
            pattern_matches = await claude_code_grep(pattern.search_regex, file_path)
            
            if pattern_matches.matches:
                for match in pattern_matches.matches:
                    risk_assessment = assess_pattern_risk(
                        pattern,
                        match,
                        file_content,
                        file_path
                    )
                    
                    detected_pattern = {
                        'pattern_name': pattern.name,
                        'file_path': file_path,
                        'line_number': match.line_number,
                        'match_text': match.text,
                        'risk_level': risk_assessment.risk_level,
                        'potential_issues': risk_assessment.potential_issues,
                        'recommendations': risk_assessment.recommendations
                    }
                    
                    if risk_assessment.risk_level >= 0.7:
                        detected_patterns['high_risk'].append(detected_pattern)
                    elif risk_assessment.risk_level >= 0.4:
                        detected_patterns['medium_risk'].append(detected_pattern)
                    else:
                        detected_patterns['low_risk'].append(detected_pattern)
    
    # Generate prevention recommendations
    prevention_plan = await generate_pattern_prevention_plan(detected_patterns)
    
    return {
        'detected_patterns': detected_patterns,
        'prevention_plan': prevention_plan,
        'risk_summary': {
            'high_risk_count': len(detected_patterns['high_risk']),
            'medium_risk_count': len(detected_patterns['medium_risk']),
            'low_risk_count': len(detected_patterns['low_risk'])
        }
    }

async def implement_error_prevention_fixes(prevention_plan, project_context):
    """
    Implement error prevention fixes using Claude Code tools
    """
    implementation_results = []
    
    for fix in prevention_plan.recommended_fixes:
        try:
            if fix.fix_type == 'code_modification':
                # Use Edit tool to apply code fixes
                fix_result = await apply_code_fix(fix, project_context)
                
            elif fix.fix_type == 'configuration_change':
                # Use Write tool to update configuration
                fix_result = await apply_configuration_fix(fix, project_context)
                
            elif fix.fix_type == 'dependency_update':
                # Use Bash tool to update dependencies
                fix_result = await apply_dependency_fix(fix, project_context)
                
            elif fix.fix_type == 'test_addition':
                # Use Write tool to add preventive tests
                fix_result = await add_preventive_tests(fix, project_context)
            
            implementation_results.append({
                'fix_id': fix.id,
                'status': 'success',
                'result': fix_result
            })
            
        except Exception as e:
            implementation_results.append({
                'fix_id': fix.id,
                'status': 'failed',
                'error': str(e)
            })
    
    # Validate fixes were applied correctly
    validation_results = await validate_prevention_fixes(
        implementation_results,
        project_context
    )
    
    return {
        'implementation_results': implementation_results,
        'validation_results': validation_results,
        'overall_success': all(r['status'] == 'success' for r in implementation_results)
    }

Real-time Error Monitoring and Learning

Continuous Learning from Claude Code Operations

async def monitor_claude_code_operations():
    """
    Continuously monitor Claude Code operations for error patterns and learning opportunities
    """
    operation_monitor = {
        'tool_usage_monitor': ToolUsageMonitor(),
        'error_detection_monitor': ErrorDetectionMonitor(),
        'performance_monitor': PerformanceMonitor(),
        'success_pattern_monitor': SuccessPatternMonitor()
    }
    
    async def monitoring_loop():
        while True:
            # Collect operation data
            operation_data = await collect_operation_data(operation_monitor)
            
            # Analyze for error patterns
            error_analysis = await analyze_for_error_patterns(operation_data)
            
            if error_analysis.new_patterns_detected:
                # Learn new error patterns
                await learn_new_error_patterns(error_analysis.new_patterns)
                
                # Update prevention rules
                await update_prevention_rules(error_analysis.new_patterns)
            
            # Analyze for success patterns
            success_analysis = await analyze_for_success_patterns(operation_data)
            
            if success_analysis.new_patterns_detected:
                # Learn new success patterns
                await learn_new_success_patterns(success_analysis.new_patterns)
                
                # Update recommendation engine
                await update_recommendation_engine(success_analysis.new_patterns)
            
            # Update error prevention database
            await update_error_prevention_database(
                error_analysis,
                success_analysis,
                operation_data
            )
            
            await asyncio.sleep(5)  # Monitor every 5 seconds
    
    # Start monitoring
    await monitoring_loop()

async def learn_from_error_occurrence(error_details, context):
    """
    Learn from actual error occurrences to improve prevention
    """
    # Create error entry
    error_entry = {
        'id': generate_uuid(),
        'timestamp': datetime.utcnow().isoformat(),
        'error_details': error_details,
        'context': context,
        'severity': classify_error_severity(error_details),
        'category': classify_error_category(error_details)
    }
    
    # Perform root cause analysis
    root_cause_analysis = await perform_root_cause_analysis(
        error_details,
        context
    )
    error_entry['root_cause_analysis'] = root_cause_analysis
    
    # Generate prevention strategies
    prevention_strategies = await generate_prevention_strategies(
        error_entry,
        root_cause_analysis
    )
    error_entry['prevention_strategy'] = prevention_strategies
    
    # Store error entry
    await store_error_entry(error_entry)
    
    # Update prevention rules
    await update_prevention_rules_from_error(error_entry)
    
    # Notify relevant personas about new error pattern
    await notify_personas_of_new_error_pattern(error_entry)
    
    return {
        'error_learned': True,
        'prevention_strategies_generated': len(prevention_strategies['prevention_steps']),
        'detection_rules_created': len(prevention_strategies['detection_rules'])
    }

Error Prevention Dashboard and Reporting

Comprehensive Error Prevention Analytics

error_prevention_metrics:
  prevention_effectiveness:
    errors_prevented: "Count of errors caught before execution"
    false_positives: "Warnings that didn't lead to actual errors"
    false_negatives: "Errors that weren't caught by prevention"
    prevention_accuracy: "Percentage of accurate error predictions"
    
  learning_progress:
    new_patterns_learned: "Number of new error patterns identified"
    pattern_accuracy_improvement: "How pattern recognition has improved"
    prevention_rule_effectiveness: "Success rate of prevention rules"
    
  system_reliability:
    mean_time_between_errors: "MTBE for different error categories"
    error_severity_distribution: "Breakdown of error types caught"
    recovery_time_improvement: "How quickly errors are resolved"
    
  development_impact:
    development_velocity_impact: "How prevention affects speed"
    code_quality_improvement: "Measurable quality gains"
    developer_confidence: "Survey results on prevention helpfulness"

Claude Code Integration Commands

# Error prevention and analysis
bmad prevent --analyze-risks --operation "database-migration"
bmad prevent --scan-patterns --project-path "src/"
bmad prevent --check-command "rm -rf node_modules" --suggest-safer

# Error learning and pattern management
bmad errors learn --from-incident "incident-report.md"
bmad errors patterns --list --category "security"
bmad errors rules --update --based-on-recent-failures

# Prevention implementation
bmad prevent implement --fixes-for "high-risk-patterns"
bmad prevent validate --applied-fixes --test-effectiveness
bmad prevent monitor --real-time --alert-on-risks

# Error prevention reporting
bmad prevent report --effectiveness --time-period "last-month"
bmad prevent dashboard --show-trends --error-categories
bmad prevent export --prevention-rules --format "yaml"

This Error Prevention System transforms Claude Code into a proactive development assistant that learns from every mistake and continuously improves its ability to prevent errors, creating an increasingly safe and reliable development environment.

21 KiB Raw Blame History