Explore Help

ldy26098

/

BMAD-METHOD

mirror of https://github.com/bmad-code-org/BMAD-METHOD.git

1

0

You've already forked BMAD-METHOD

Code Packages Projects Releases Wiki Activity

BMAD-METHOD/bmad-agent/tasks/root-cause-analysis-task.md

9.0 KiB

Raw Blame History

Root Cause Analysis Task

Purpose

To conduct comprehensive root cause analysis for complex technical issues, utilizing systematic methodologies to identify underlying causes and develop effective prevention strategies across all technology stacks.

Task Overview

This task provides a structured approach to deep-dive analysis of technical problems, ensuring thorough investigation of root causes and development of comprehensive solutions that address underlying issues rather than just symptoms.

Inputs for this Task

Incident description and timeline
System logs and diagnostic data
Performance metrics and monitoring data
Environmental configuration details
Stakeholder interviews and observations
Previous incident history and patterns

Task Execution Instructions

Phase 1: Incident Reconstruction and Data Collection

1.1 Timeline Reconstruction

Chronological Analysis:
- Create detailed timeline of events leading to the incident
- Identify trigger events and contributing factors
- Map system changes and deployments to timeline
- Correlate user actions with system behaviors
Data Point Collection:
- Gather all relevant logs from affected systems
- Collect performance metrics before, during, and after incident
- Document configuration changes and system modifications
- Compile user reports and stakeholder observations

1.2 System State Analysis

Pre-Incident State:
- Analyze system health and performance baselines
- Review recent changes and deployments
- Identify any warning signs or anomalies
- Document normal operational parameters
Incident State:
- Capture system behavior during the incident
- Document error conditions and failure modes
- Analyze resource utilization and constraints
- Record user impact and business consequences

Phase 2: Systematic Root Cause Investigation

2.1 5 Whys Analysis

Iterative Questioning:
- Start with the immediate problem statement
- Ask "Why did this happen?" for each identified cause
- Continue questioning until fundamental root cause is reached
- Document each level of analysis with supporting evidence
Evidence Validation:
- Support each "why" with concrete evidence
- Verify assumptions with data and testing
- Eliminate speculation and focus on facts
- Cross-reference findings with multiple data sources

2.2 Fishbone Diagram Analysis

Category-Based Investigation:
- People: Human factors, training, procedures, communication
- Process: Workflows, procedures, policies, standards
- Technology: Hardware, software, infrastructure, tools
- Environment: External factors, dependencies, constraints
Comprehensive Cause Mapping:
- Identify all potential contributing factors in each category
- Analyze interactions between different categories
- Prioritize causes based on impact and evidence
- Validate cause relationships with data and testing

2.3 Fault Tree Analysis

Top-Down Analysis:
- Start with the top-level failure event
- Systematically break down into contributing events
- Use logical gates (AND, OR) to show relationships
- Continue decomposition until basic events are reached
Probability Assessment:
- Assign probability estimates to basic events
- Calculate overall failure probability
- Identify critical paths and high-impact factors
- Prioritize mitigation efforts based on risk analysis

Phase 3: Contributing Factor Analysis

3.1 Technical Contributing Factors

System Design Issues:
- Architecture limitations and design flaws
- Scalability constraints and bottlenecks
- Integration weaknesses and dependencies
- Performance limitations and resource constraints
Implementation Problems:
- Code defects and logic errors
- Configuration mistakes and inconsistencies
- Deployment issues and environment differences
- Testing gaps and validation failures

3.2 Process Contributing Factors

Operational Processes:
- Monitoring and alerting gaps
- Incident response procedures
- Change management processes
- Capacity planning and resource management
Development Processes:
- Code review and quality assurance
- Testing strategies and coverage
- Deployment and release procedures
- Documentation and knowledge management

3.3 Human Contributing Factors

Knowledge and Training:
- Skill gaps and training needs
- Knowledge transfer and documentation
- Experience levels and expertise
- Communication and collaboration
Decision Making:
- Risk assessment and management
- Priority setting and resource allocation
- Escalation procedures and authority
- Information availability and quality

Phase 4: Solution Development and Prevention Strategy

4.1 Immediate Corrective Actions

Symptom Resolution:
- Address immediate symptoms and restore service
- Implement temporary workarounds if needed
- Ensure system stability and user access
- Monitor for recurrence or side effects
Data Preservation:
- Preserve evidence for further analysis
- Backup system states and configurations
- Document all corrective actions taken
- Maintain audit trail for compliance

4.2 Root Cause Remediation

Fundamental Fixes:
- Address identified root causes directly
- Implement systematic solutions rather than patches
- Consider long-term sustainability and maintainability
- Plan for comprehensive testing and validation
System Improvements:
- Enhance system design and architecture
- Improve monitoring and observability
- Strengthen error handling and resilience
- Optimize performance and scalability

4.3 Prevention Strategy Development

Proactive Measures:
- Implement monitoring and alerting for early detection
- Develop automated testing and validation procedures
- Create preventive maintenance and health checks
- Establish capacity planning and resource management
Process Improvements:
- Enhance change management and deployment procedures
- Improve incident response and escalation processes
- Strengthen quality assurance and testing practices
- Develop training and knowledge sharing programs

Quality Validation

Analysis Quality Checks

Root cause analysis is evidence-based and thorough
Multiple analysis methodologies were applied appropriately
All contributing factors were identified and validated
Cause relationships are logical and well-supported
Analysis depth reaches fundamental root causes

Solution Quality Checks

Solutions address root causes, not just symptoms
Prevention strategies are comprehensive and practical
Implementation plans are detailed and realistic
Risk assessment and mitigation are included
Success criteria and metrics are defined

Documentation Quality Checks

Analysis process and findings are clearly documented
Evidence and supporting data are properly referenced
Recommendations are actionable and prioritized
Lessons learned are captured and shareable
Knowledge base is updated with findings

Integration Points

BMAD Method Integration

Integration with troubleshooting and problem resolution workflows
Cross-persona collaboration for complex multi-domain analysis
Integration with quality validation and improvement processes
Support for organizational learning and knowledge management

Tool and Process Integration

Integration with incident management and ticketing systems
Support for monitoring and observability platforms
Compatibility with quality assurance and testing frameworks
Integration with change management and deployment processes

Success Metrics

Analysis Effectiveness

Root cause identification accuracy
Analysis completeness and thoroughness
Time to root cause identification
Stakeholder satisfaction with analysis quality

Solution Effectiveness

Problem recurrence rate
Solution implementation success rate
Prevention strategy effectiveness
System reliability improvement

Organizational Learning

Knowledge base contribution and utilization
Process improvement implementation rate
Team skill development and knowledge transfer
Incident prevention and early detection improvement

Deliverables

Primary Deliverables

Root Cause Analysis Report: Comprehensive analysis with findings and evidence
Corrective Action Plan: Detailed plan for addressing root causes
Prevention Strategy: Comprehensive approach to preventing recurrence
Implementation Roadmap: Prioritized plan for solution implementation

Supporting Deliverables

Timeline Reconstruction: Detailed chronology of events and factors
Contributing Factor Analysis: Comprehensive analysis of all contributing elements
Risk Assessment: Analysis of risks and mitigation strategies
Lessons Learned Document: Insights and recommendations for organizational improvement

Remember: Effective root cause analysis requires systematic methodology, thorough investigation, and focus on fundamental causes rather than surface symptoms.

Powered by Gitea Version: 1.21.1 Page: 21ms Template: 1ms

English

Bahasa Indonesia Deutsch English Español Français Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses API