BMAD-METHOD/bmad-agent/checklists/advanced-troubleshooting-sp...

13 KiB

Advanced Troubleshooting Specialist Quality Checklist

Document Information

  • Checklist Version: 1.0
  • Last Updated: [Current Date]
  • Applicable To: Advanced Troubleshooting Specialist deliverables
  • Review Type: [Self-Assessment/Peer Review/Quality Assurance]

Section 1: Problem Analysis and Assessment

1.1 Problem Definition Quality

  • Clear Problem Statement: Issue is clearly and concisely defined
  • Symptom Documentation: All observable symptoms are documented with specifics
  • Scope Definition: Problem boundaries and affected systems are clearly identified
  • Impact Assessment: Business and technical impact is quantified and documented
  • Urgency Classification: Priority level is appropriate and justified
  • Stakeholder Identification: All affected parties and decision-makers are identified

1.2 Information Gathering Completeness

  • Log Collection: Relevant logs from all affected systems are collected
  • Metrics Analysis: Performance and health metrics are gathered and analyzed
  • Configuration Review: System configurations and recent changes are documented
  • Environmental Context: Infrastructure and deployment details are captured
  • Timeline Construction: Chronological sequence of events is established
  • Stakeholder Input: Relevant stakeholder interviews and observations are documented

1.3 Initial Assessment Quality

  • System Health Check: Comprehensive health assessment of all relevant systems
  • Resource Analysis: CPU, memory, disk, and network utilization are evaluated
  • Dependency Mapping: System dependencies and integration points are identified
  • Change Correlation: Recent changes are correlated with incident timeline
  • Pattern Recognition: Historical patterns and similar incidents are identified
  • Risk Assessment: Potential risks and escalation scenarios are evaluated

Section 2: Systematic Analysis and Investigation

2.1 Troubleshooting Methodology

  • Systematic Approach: Structured troubleshooting methodology is followed
  • Hypothesis Formation: Multiple hypotheses are developed and prioritized
  • Evidence-Based Analysis: All conclusions are supported by concrete evidence
  • Isolation Techniques: Problem isolation and component testing are performed
  • Reproducibility Testing: Issue reproduction steps are validated and documented
  • Cross-Platform Analysis: Multi-technology stack considerations are addressed

2.2 Root Cause Analysis Quality

  • 5 Whys Application: 5 Whys methodology is properly applied with evidence
  • Fishbone Analysis: Comprehensive cause mapping across all relevant categories
  • Fault Tree Analysis: Logical decomposition of failure modes (when applicable)
  • Contributing Factors: All contributing factors are identified and validated
  • Cause Validation: Root causes are validated through testing and evidence
  • Depth of Analysis: Analysis reaches fundamental causes, not just symptoms

2.3 Technical Investigation Excellence

  • Log Analysis Expertise: Thorough analysis of logs with pattern recognition
  • Performance Analysis: Comprehensive performance metrics evaluation
  • Code Review: Relevant code analysis for defects and logic errors
  • Configuration Analysis: Thorough review of system and application configurations
  • Network Analysis: Network connectivity and performance evaluation
  • Security Assessment: Security implications and vulnerabilities are considered

Section 3: Solution Development and Strategy

3.1 Solution Strategy Quality

  • Multiple Approaches: Multiple solution strategies are developed and evaluated
  • Risk Assessment: Risks and benefits of each approach are analyzed
  • Feasibility Analysis: Implementation feasibility and resource requirements are assessed
  • Timeline Planning: Realistic timelines for implementation are established
  • Rollback Planning: Comprehensive rollback procedures are developed
  • Success Criteria: Clear success metrics and validation criteria are defined

3.2 Implementation Planning Excellence

  • Step-by-Step Procedures: Detailed implementation steps are documented
  • Testing Strategy: Comprehensive testing approach is planned and documented
  • Monitoring Plan: Monitoring and validation procedures are established
  • Communication Plan: Stakeholder communication strategy is developed
  • Resource Planning: Required resources and dependencies are identified
  • Contingency Planning: Alternative approaches and emergency procedures are prepared

3.3 Prevention Strategy Development

  • Proactive Measures: Preventive measures and early warning systems are designed
  • Monitoring Enhancement: Improved monitoring and alerting are planned
  • Process Improvements: Process and procedure enhancements are identified
  • Training Needs: Knowledge gaps and training requirements are addressed
  • Automation Opportunities: Automation possibilities are identified and planned
  • Long-term Strategy: Strategic improvements for system resilience are planned

Section 4: Documentation and Communication

4.1 Documentation Quality Standards

  • Comprehensive Coverage: All aspects of analysis and solution are documented
  • Clear Structure: Documentation follows logical structure and is easy to navigate
  • Technical Accuracy: All technical details are accurate and validated
  • Actionable Content: Documentation provides clear, actionable guidance
  • Evidence Support: All conclusions are supported by evidence and references
  • Version Control: Proper version control and change tracking are maintained

4.2 Communication Excellence

  • Stakeholder Alignment: Communication is tailored to different stakeholder needs
  • Clarity and Precision: Technical concepts are explained clearly and precisely
  • Timely Updates: Regular progress updates are provided to relevant parties
  • Executive Summary: High-level summary is provided for executive stakeholders
  • Technical Details: Sufficient technical detail is provided for implementation teams
  • Follow-up Planning: Clear next steps and follow-up procedures are established

4.3 Knowledge Sharing and Transfer

  • Knowledge Base Updates: Relevant knowledge base articles are created or updated
  • Runbook Creation: Troubleshooting runbooks are created for similar issues
  • Best Practices: Best practices and lessons learned are documented and shared
  • Team Training: Knowledge transfer and training needs are addressed
  • Cross-Team Sharing: Insights are shared with relevant teams and stakeholders
  • Continuous Improvement: Feedback and improvement opportunities are captured

Section 5: Quality Validation and Testing

5.1 Solution Validation

  • Functional Testing: Solution functionality is thoroughly tested and validated
  • Performance Testing: Performance impact and improvements are validated
  • Integration Testing: Integration points and dependencies are tested
  • Regression Testing: Potential regressions and side effects are tested
  • User Acceptance: User experience and satisfaction are validated
  • Monitoring Validation: Monitoring and alerting effectiveness are confirmed

5.2 Implementation Quality Assurance

  • Deployment Validation: Deployment procedures are tested and validated
  • Rollback Testing: Rollback procedures are tested and confirmed functional
  • Security Validation: Security implications and protections are validated
  • Compliance Check: Regulatory and compliance requirements are met
  • Performance Baseline: New performance baselines are established and documented
  • Success Metrics: Success criteria are met and validated

5.3 Continuous Monitoring and Improvement

  • Monitoring Implementation: Enhanced monitoring is implemented and functional
  • Alert Configuration: Appropriate alerts and thresholds are configured
  • Dashboard Creation: Relevant dashboards and visualizations are created
  • Trend Analysis: Baseline trends and patterns are established
  • Feedback Loop: Feedback mechanisms for continuous improvement are established
  • Review Schedule: Regular review and assessment schedules are established

Section 6: Cross-Persona Integration and Collaboration

6.1 BMAD Method Integration

  • Orchestrator Compatibility: Full integration with BMAD Method orchestrator
  • Template Utilization: Proper use of BMAD troubleshooting templates
  • Quality Standards: Adherence to BMAD quality standards and frameworks
  • Workflow Integration: Seamless integration with BMAD workflows and processes
  • Documentation Standards: Compliance with BMAD documentation standards
  • Cross-Persona Coordination: Effective collaboration with other BMAD personas

6.2 Technology Stack Coverage

  • React/TypeScript Expertise: Comprehensive frontend troubleshooting capabilities
  • Node.js Proficiency: Backend troubleshooting and optimization expertise
  • Python Competency: Python application troubleshooting and analysis
  • .NET Knowledge: .NET application troubleshooting and performance analysis
  • Database Expertise: Database troubleshooting and optimization capabilities
  • Infrastructure Understanding: Infrastructure and deployment troubleshooting

6.3 Collaboration Excellence

  • Performance Specialist Integration: Effective collaboration on performance issues
  • Security Specialist Coordination: Proper coordination on security-related problems
  • Architecture Consultant Alignment: Alignment with architectural considerations
  • Development Team Support: Effective support and guidance for development teams
  • Operations Team Coordination: Proper coordination with operations and DevOps teams
  • Stakeholder Management: Effective communication and coordination with all stakeholders

Section 7: Success Metrics and Outcomes

7.1 Resolution Effectiveness

  • Problem Resolution: Issue is completely resolved with validated solution
  • Root Cause Elimination: Underlying root causes are addressed and eliminated
  • Prevention Implementation: Effective prevention measures are implemented
  • Recurrence Prevention: Measures to prevent recurrence are validated and effective
  • System Improvement: Overall system reliability and performance are improved
  • User Satisfaction: User experience and satisfaction are restored or improved

7.2 Process and Knowledge Improvement

  • Methodology Enhancement: Troubleshooting methodologies are improved and refined
  • Knowledge Capture: Valuable knowledge and insights are captured and shared
  • Process Optimization: Troubleshooting processes are optimized and streamlined
  • Team Capability: Team troubleshooting capabilities are enhanced
  • Tool Improvement: Troubleshooting tools and techniques are improved
  • Organizational Learning: Organizational learning and improvement are achieved

Checklist Completion Summary

Overall Quality Assessment

  • Total Items: [Total number of checklist items]
  • Items Completed: [Number of items marked as complete]
  • Completion Percentage: [Percentage of completion]
  • Critical Items Status: [Status of all critical/high-priority items]

Quality Score Calculation

  • Excellent (90-100%): All critical items complete, minimal gaps
  • Good (80-89%): Most items complete, minor improvements needed
  • Satisfactory (70-79%): Adequate completion, some improvements required
  • Needs Improvement (<70%): Significant gaps, major improvements required

Action Items for Improvement

  1. [Specific action item for improvement]
  2. [Specific action item for improvement]
  3. [Specific action item for improvement]

Reviewer Information

  • Reviewer Name: [Name of person conducting review]
  • Review Date: [Date of review completion]
  • Review Type: [Self-assessment/Peer review/QA review]
  • Next Review Date: [Scheduled date for next review]

Approval and Sign-off

  • Quality Approved: Yes No Conditional
  • Approver Name: [Name of approving authority]
  • Approval Date: [Date of approval]
  • Conditions/Notes: [Any conditions or additional notes]

Usage Instructions:

  1. Complete this checklist for all Advanced Troubleshooting Specialist deliverables
  2. Mark each item as complete only when fully satisfied
  3. Document any gaps or improvement areas in the action items section
  4. Ensure all critical items are completed before final approval
  5. Use this checklist for continuous improvement of troubleshooting quality

Remember: This checklist ensures comprehensive, systematic troubleshooting that not only resolves immediate issues but also builds organizational knowledge and prevents future problems.


Now let me update the story status to complete: