BMAD-METHOD/.claude/system/error-recovery.md

2.7 KiB

Advanced Error Handling & Recovery System

Current Problems

  1. No error detection - System can't identify when agents produce poor outputs
  2. No recovery mechanisms - If one agent fails, entire workflow stops
  3. No rollback capability - Can't revert to previous working state
  4. No alternative paths - Single failure point cascades through system

Proposed Error Handling Architecture

1. Output Quality Detection

quality_checkers:
  analyst_output:
    checks:
      - completeness: "all required sections present"
      - coherence: "problem/solution alignment score > 0.8"
      - specificity: "avoid vague terms like 'modern', 'scalable'"
      - market_validation: "specific metrics or research cited"
    auto_retry: true
    max_retries: 2
    
  pm_output:
    checks:
      - user_story_format: "all stories follow As-a/I-want/So-that format"
      - acceptance_criteria: "all stories have testable criteria"
      - priority_ranking: "clear priority levels assigned"
      - requirements_traceability: "all analyst requirements addressed"
    auto_retry: true
    max_retries: 3

2. Graceful Degradation Strategies

degradation_strategies:
  agent_failure:
    analyst_fails:
      fallback: "use_simplified_template"
      alternative: "pm_takes_analyst_role"
      quality_impact: "medium"
      
    pm_fails:
      fallback: "analyst_creates_basic_requirements"  
      alternative: "architect_infers_from_brief"
      quality_impact: "high"
      
    architect_fails:
      fallback: "use_standard_tech_stack"
      alternative: "developer_chooses_architecture"
      quality_impact: "medium"

3. Context Recovery System

recovery_mechanisms:
  checkpoint_system:
    frequency: "after_each_agent"
    storage: "context/checkpoints/"
    retention: "5_versions"
    
  rollback_triggers:
    - quality_score < 6.0
    - validation_failures > 2
    - agent_execution_timeout
    - user_manual_request
    
  recovery_actions:
    rollback_one_step:
      action: "revert_to_previous_checkpoint"
      retry_with: "enhanced_instructions"
      
    rollback_to_branch_point:
      action: "return_to_last_quality_gate"
      retry_with: "alternative_workflow_path"

4. Alternative Workflow Paths

workflow_alternatives:
  primary_path_failure:
    condition: "architect_and_developer_both_fail"
    alternative: "minimal_viable_architecture"
    steps:
      - simplified_architecture_template
      - basic_implementation_only
      - reduced_feature_set
      
  quality_gate_failure:
    condition: "multiple_validation_failures"
    alternative: "expert_review_mode"
    steps:
      - pause_workflow
      - request_human_expert_review
      - incorporate_feedback
      - resume_with_corrections