Explore Help

ldy26098

/

BMAD-METHOD

mirror of https://github.com/bmad-code-org/BMAD-METHOD.git

1

0

You've already forked BMAD-METHOD

Code Packages Projects Releases Wiki Activity

BMAD-METHOD/bmad-agent/tasks/advanced-troubleshooting-an...

8.8 KiB

Raw Blame History

Advanced Troubleshooting Analysis Task

Purpose

To provide comprehensive troubleshooting analysis for complex technical issues across React, TypeScript, Node.js, ASP.NET, and Python technology stacks, utilizing systematic debugging methodologies and root cause analysis techniques.

Task Overview

This task guides the Advanced Troubleshooting Specialist through a structured approach to diagnosing and resolving sophisticated technical problems, ensuring thorough analysis, effective solutions, and comprehensive documentation.

Inputs for this Task

Problem description and symptoms
System logs and error messages
Performance metrics and monitoring data
Environment configuration details
Reproduction steps and conditions
Impact assessment and urgency level

Task Execution Instructions

Phase 1: Problem Assessment and Information Gathering

1.1 Initial Problem Analysis

Problem Definition:
- Clearly define the issue, symptoms, and observable behaviors
- Identify affected systems, components, and user groups
- Assess business impact and urgency level
- Determine problem scope and boundaries
Information Collection:
- Gather system logs, error messages, and stack traces
- Collect performance metrics and monitoring data
- Document environment configuration and recent changes
- Obtain reproduction steps and conditions
- Interview stakeholders and affected users

1.2 Environmental Assessment

System Health Check:
- Verify system resource utilization (CPU, memory, disk, network)
- Check service status and connectivity
- Validate configuration settings and dependencies
- Review recent deployments and changes
Technology Stack Analysis:
- Identify all components in the technology stack
- Verify version compatibility and dependencies
- Check for known issues or vulnerabilities
- Assess integration points and data flows

Phase 2: Systematic Analysis and Root Cause Investigation

2.1 Log Analysis and Pattern Recognition

Log Examination:
- Analyze application logs for error patterns and anomalies
- Examine system logs for infrastructure issues
- Review security logs for potential security incidents
- Correlate logs across multiple systems and timeframes
Error Pattern Analysis:
- Identify recurring error patterns and frequencies
- Analyze error correlation with system events
- Map errors to specific components or operations
- Determine error propagation paths

2.2 Performance Analysis

Metrics Evaluation:
- Analyze response times, throughput, and latency metrics
- Examine resource utilization patterns and trends
- Identify performance bottlenecks and constraints
- Assess scalability and capacity issues
Profiling and Tracing:
- Conduct application profiling for performance hotspots
- Implement distributed tracing for request flows
- Analyze database query performance and optimization
- Examine memory usage patterns and garbage collection

2.3 Root Cause Analysis

Hypothesis Formation:
- Develop multiple hypotheses for potential root causes
- Prioritize hypotheses based on evidence and probability
- Design tests to validate or eliminate hypotheses
- Consider both technical and process-related causes
Systematic Investigation:
- Apply 5 Whys methodology for deep analysis
- Use fishbone diagrams for comprehensive cause mapping
- Implement fault tree analysis for complex systems
- Conduct timeline reconstruction for incident analysis

Phase 3: Solution Development and Strategy Planning

3.1 Solution Strategy Development

Multiple Approach Development:
- Design immediate workarounds for urgent issues
- Develop short-term fixes for quick resolution
- Plan long-term solutions for permanent resolution
- Consider preventive measures and improvements
Risk Assessment:
- Evaluate risks associated with each solution approach
- Assess potential side effects and system impacts
- Determine rollback procedures and contingency plans
- Consider resource requirements and timelines

3.2 Implementation Planning

Solution Prioritization:
- Rank solutions by effectiveness and feasibility
- Consider implementation complexity and resource requirements
- Assess business impact and user experience implications
- Plan phased implementation for complex solutions
Testing Strategy:
- Design comprehensive testing procedures
- Plan validation criteria and success metrics
- Implement monitoring and alerting for solution effectiveness
- Prepare rollback procedures and emergency responses

Phase 4: Implementation, Validation, and Documentation

4.1 Solution Implementation

Controlled Deployment:
- Implement solutions in controlled environments first
- Monitor system behavior and performance during implementation
- Validate solution effectiveness against defined criteria
- Ensure proper backup and rollback capabilities
Monitoring and Validation:
- Implement comprehensive monitoring for solution effectiveness
- Track key performance indicators and success metrics
- Monitor for side effects or unintended consequences
- Validate user experience and business impact improvements

4.2 Documentation and Knowledge Sharing

Comprehensive Documentation:
- Document problem description, analysis, and root cause
- Record solution implementation steps and procedures
- Create troubleshooting runbooks for similar issues
- Document lessons learned and improvement recommendations
Knowledge Base Integration:
- Add findings to organizational knowledge base
- Create searchable documentation for future reference
- Share insights with relevant teams and stakeholders
- Update procedures and best practices based on learnings

Quality Validation

Technical Quality Checks

Root cause analysis is thorough and evidence-based
Solutions address underlying causes, not just symptoms
Implementation includes proper testing and validation
Monitoring and alerting are implemented for ongoing detection
Documentation is comprehensive and actionable

Process Quality Checks

Systematic troubleshooting methodology was followed
Multiple solution approaches were considered
Risk assessment and mitigation planning were conducted
Stakeholder communication was maintained throughout
Knowledge sharing and documentation were completed

Outcome Quality Checks

Problem resolution meets defined success criteria
Solution implementation does not introduce new issues
System performance and stability are maintained or improved
User experience and business impact are positively affected
Prevention strategies are implemented to avoid recurrence

Integration Points

BMAD Method Integration

Seamless integration with BMAD orchestrator for task management
Cross-persona collaboration for complex multi-domain issues
Integration with quality validation frameworks and standards
Support for automated workflow and documentation generation

Tool and Platform Integration

Integration with monitoring and observability platforms
Support for log aggregation and analysis tools
Compatibility with debugging and profiling tools
Integration with incident management and ticketing systems

Success Metrics

Resolution Effectiveness

Mean time to resolution (MTTR)
First-call resolution rate
Problem recurrence rate
Solution effectiveness score

Process Efficiency

Troubleshooting methodology adherence
Documentation completeness and quality
Knowledge base contribution and utilization
Team skill development and knowledge transfer

System Improvement

Incident reduction rate
Proactive issue identification and prevention
Monitoring and alerting coverage improvement
Overall system reliability and performance enhancement

Deliverables

Primary Deliverables

Troubleshooting Analysis Report: Comprehensive analysis of the problem, root cause, and solution
Solution Implementation Guide: Step-by-step procedures for implementing the solution
Monitoring and Alerting Configuration: Setup for ongoing detection and prevention
Troubleshooting Runbook: Reusable procedures for similar issues

Supporting Deliverables

Root Cause Analysis Documentation: Detailed analysis of underlying causes
Risk Assessment and Mitigation Plan: Comprehensive risk analysis and mitigation strategies
Knowledge Base Entries: Searchable documentation for organizational learning
Process Improvement Recommendations: Suggestions for preventing similar issues

Remember: This task ensures systematic, thorough troubleshooting that not only resolves immediate issues but also builds organizational knowledge and prevents future problems.

Powered by Gitea Version: 1.21.1 Page: 18ms Template: 1ms

English

Bahasa Indonesia Deutsch English Español Français Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses API