diff --git a/AGENT.md b/AGENT.md deleted file mode 100644 index ceeee113..00000000 --- a/AGENT.md +++ /dev/null @@ -1,28 +0,0 @@ -# BMad-Method Agent Guide - -## Build Commands -- `npm run build` - Build all agents and teams -- `npm run build:agents` - Build only agent bundles -- `npm run build:teams` - Build only team bundles -- `npm run validate` - Validate configuration and files -- `npm run format` - Format all Markdown files with Prettier -- `node tools/cli.js list:agents` - List available agents - -## Test Commands -- No formal test suite - validation via `npm run validate` -- Manual testing via building agents/teams and checking outputs - -## Architecture -- **Core**: `bmad-core/` - Agent definitions, templates, workflows, user guide -- **Tools**: `tools/` - CLI build system, installers, web builders -- **Expansion Packs**: `expansion-packs/` - Domain-specific agent collections -- **Distribution**: `dist/` - Built agent/team bundles for web deployment -- **Config**: `bmad-core/core-config.yaml` - Sharding, paths, markdown settings - -## Code Style -- **Modules**: CommonJS (`require`/`module.exports`), some ES modules via dynamic import -- **Classes**: PascalCase (WebBuilder), methods camelCase (buildAgents) -- **Files**: kebab-case (web-builder.js), constants UPPER_CASE -- **Error Handling**: Try-catch with graceful fallback, async/await patterns -- **Imports**: Node built-ins, fs-extra, chalk, commander, js-yaml -- **Paths**: Always use `path.join()`, absolute paths via `path.resolve()` diff --git a/README.md b/README.md index 2f116983..3b00bcd0 100644 --- a/README.md +++ b/README.md @@ -25,29 +25,6 @@ This two-phase approach eliminates both **planning inconsistency** and **context **πŸ“– [See the complete workflow in the User Guide](bmad-core/user-guide.md)** - Planning phase, development cycle, and all agent roles -## πŸ†• Claude Code Integration (Alpha at best) - -**NEW:** A contribution is attempting to integrate BMad-Method with [Claude Code's new subagents released 7/24](https://docs.anthropic.com/en/docs/claude-code/sub-agents)! Transform BMad's agents into native Claude Code subagents for seamless AI-powered development. - -**⚠️ This is an alpha feature, and may not work as expected.** In fact I know it doesn't fully work but subagents were just released a few hours before I finished the initial cut here, so please open defects against [BMAD-AT-CLAUDE](https://github.com/24601/BMAD-AT-CLAUDE/issues). - -There are a few enhancements I have attempted to make to make the flow/DX of using BMAD-METHOD with Claude Code subagents more seamless: - -- Shared scratchpad for handoffs -- Use the `description` facility to provide semantic meaning to claude for auto-call agents appropriately -- Memory priming -- Data sourcing helper - -```bash -# Generate Claude Code subagents -npm run build:claude - -# Start Claude Code -claude -``` - -**[πŸ“– Complete Claude Integration Guide](docs/claude-integration.md)** - Setup, usage, and workflows - ## Quick Navigation ### Understanding the BMad Workflow diff --git a/bmad-core/agents/analyst.md b/bmad-core/agents/analyst.md index 06751f36..3597e988 100644 --- a/bmad-core/agents/analyst.md +++ b/bmad-core/agents/analyst.md @@ -39,14 +39,7 @@ persona: style: Analytical, inquisitive, creative, facilitative, objective, data-informed identity: Strategic analyst specializing in brainstorming, market research, competitive analysis, and project briefing focus: Research planning, ideation facilitation, strategic analysis, actionable insights - methodology: | - HYPOTHESIS-DRIVEN ANALYSIS FRAMEWORK: - Step 1: Formulate Hypotheses - Start every analysis by stating 2-3 testable hypotheses about the market/problem - Step 2: Gather Evidence - Collect quantitative data and qualitative sources to validate/refute each hypothesis - Step 3: Validate & Score - Rate each hypothesis (High/Medium/Low confidence) based on evidence strength - Step 4: Synthesize Insights - Transform validated hypotheses into actionable strategic recommendations core_principles: - - Hypothesis-First Approach - Begin analysis with explicit testable assumptions - Curiosity-Driven Inquiry - Ask probing "why" questions to uncover underlying truths - Objective & Evidence-Based Analysis - Ground findings in verifiable data and credible sources - Strategic Contextualization - Frame all work within broader strategic context diff --git a/bmad-core/data/competitive-benchmarks.csv b/bmad-core/data/competitive-benchmarks.csv deleted file mode 100644 index 8938486f..00000000 --- a/bmad-core/data/competitive-benchmarks.csv +++ /dev/null @@ -1,11 +0,0 @@ -Company,Tool,Users_Millions,Revenue_Millions_USD,AI_Features,Market_Share_Percent,Founded -Atlassian,Jira,200,3000,Intelligence,15.2,2002 -Monday.com,Monday,180,900,AI Assistant,12.8,2012 -Asana,Asana,145,455,Intelligence,10.1,2008 -Microsoft,Project,120,2100,Copilot,8.7,1984 -Smartsheet,Smartsheet,82,740,DataMesh,5.9,2005 -Notion,Notion,35,275,AI Writing,2.5,2016 -ClickUp,ClickUp,25,156,ClickUp AI,1.8,2017 -Linear,Linear,8,45,Predictive,0.6,2019 -Airtable,Airtable,65,735,Apps,4.7,2012 -Basecamp,Basecamp,15,99,Limited,1.1,1999 diff --git a/bmad-core/data/fintech-compliance.md b/bmad-core/data/fintech-compliance.md deleted file mode 100644 index ef8153d7..00000000 --- a/bmad-core/data/fintech-compliance.md +++ /dev/null @@ -1,90 +0,0 @@ -# Fintech Compliance and Regulatory Guidelines - -## PCI DSS Compliance - -### Level 1 Requirements (>6M transactions/year) -- **Network Security**: Firewall, network segmentation -- **Data Protection**: Encrypt cardholder data, mask PAN -- **Access Control**: Unique IDs, two-factor authentication -- **Monitoring**: Log access, file integrity monitoring -- **Testing**: Vulnerability scanning, penetration testing -- **Policies**: Information security policy, incident response - -### Implementation Checklist -- [ ] Tokenize card data, never store full PAN -- [ ] Use validated payment processors (Stripe, Square) -- [ ] Implement Point-to-Point Encryption (P2PE) -- [ ] Regular security assessments and audits -- [ ] Staff training on data handling procedures - -## SOX Compliance (Public Companies) - -### Key Controls -- **ITGC**: IT General Controls for financial systems -- **Change Management**: Documented approval processes -- **Access Reviews**: Quarterly user access audits -- **Segregation of Duties**: Separate authorization/recording -- **Documentation**: Maintain audit trails and evidence - -## GDPR/Privacy Regulations - -### Data Processing Requirements -- **Lawful Basis**: Consent, contract, legitimate interest -- **Data Minimization**: Collect only necessary data -- **Purpose Limitation**: Use data only for stated purposes -- **Retention Limits**: Delete data when no longer needed -- **Data Subject Rights**: Access, rectification, erasure, portability - -### Technical Safeguards -- **Privacy by Design**: Build privacy into system architecture -- **Encryption**: End-to-end encryption for personal data -- **Pseudonymization**: Replace identifiers with artificial ones -- **Data Loss Prevention**: Monitor and prevent unauthorized access - -## Banking Regulations - -### Open Banking (PSD2) -- **Strong Customer Authentication**: Multi-factor authentication -- **API Security**: OAuth 2.0, mutual TLS, certificate validation -- **Data Sharing**: Consent management, scope limitation -- **Fraud Prevention**: Real-time monitoring, risk scoring - -### Anti-Money Laundering (AML) -- **Customer Due Diligence**: Identity verification, risk assessment -- **Transaction Monitoring**: Unusual pattern detection -- **Suspicious Activity Reporting**: Automated SAR generation -- **Record Keeping**: 5-year transaction history retention - -## Testing Requirements - -### Compliance Testing -- **Penetration Testing**: Annual external security assessments -- **Vulnerability Scanning**: Quarterly automated scans -- **Code Reviews**: Security-focused static analysis -- **Red Team Exercises**: Simulated attack scenarios - -### Audit Preparation -- **Documentation**: Policies, procedures, evidence collection -- **Control Testing**: Validate effectiveness of security controls -- **Gap Analysis**: Identify compliance deficiencies -- **Remediation Planning**: Prioritize and track fixes - -## Regional Considerations - -### United States -- **CCPA**: California Consumer Privacy Act requirements -- **GLBA**: Gramm-Leach-Bliley Act for financial institutions -- **FFIEC**: Federal guidance for IT risk management -- **State Regulations**: Additional requirements by state - -### European Union -- **PSD2**: Payment Services Directive -- **GDPR**: General Data Protection Regulation -- **MiFID II**: Markets in Financial Instruments Directive -- **EBA Guidelines**: European Banking Authority standards - -### Asia-Pacific -- **PDPA**: Personal Data Protection Acts (Singapore, Thailand) -- **Privacy Act**: Australia's privacy legislation -- **PIPEDA**: Canada's Personal Information Protection -- **Local Banking**: Country-specific financial regulations diff --git a/bmad-core/data/market-sizes.csv b/bmad-core/data/market-sizes.csv deleted file mode 100644 index da6c544d..00000000 --- a/bmad-core/data/market-sizes.csv +++ /dev/null @@ -1,11 +0,0 @@ -Market,Size_USD_Billions,Growth_Rate_CAGR,Year,Source -Project Management Software,7.8,10.1%,2023,Grand View Research -AI-Powered PM Tools,1.2,24.3%,2023,TechNavio -Agile Development Tools,2.1,15.7%,2023,Mordor Intelligence -Customer Support Software,24.5,12.2%,2023,Fortune Business Insights -Collaboration Software,31.2,9.5%,2023,Allied Market Research -DevOps Tools,8.9,18.4%,2023,Global Market Insights -SaaS Project Management,4.5,11.8%,2023,ResearchAndMarkets -Enterprise PM Solutions,6.2,8.9%,2023,MarketsandMarkets -Mobile PM Apps,1.8,16.2%,2023,IBISWorld -Cloud PM Platforms,5.4,13.1%,2023,Verified Market Research diff --git a/bmad-core/data/security-patterns.md b/bmad-core/data/security-patterns.md deleted file mode 100644 index 3f1cc129..00000000 --- a/bmad-core/data/security-patterns.md +++ /dev/null @@ -1,62 +0,0 @@ -# Security Patterns and Best Practices - -## Authentication & Authorization - -### JWT Best Practices -- **Expiry**: Access tokens 15-30 minutes, refresh tokens 7-30 days -- **Algorithm**: Use RS256 for public/private key signing -- **Claims**: Include minimal necessary data (user_id, roles, exp) -- **Storage**: HttpOnly cookies for web, secure storage for mobile -- **Validation**: Always verify signature, expiry, and issuer - -### OAuth 2.0 Implementation -- **PKCE**: Required for all public clients (SPAs, mobile) -- **State Parameter**: Prevent CSRF attacks -- **Scope Limitation**: Request minimal necessary permissions -- **Redirect URI**: Exact match validation, no wildcards - -## Data Protection - -### Encryption Standards -- **At Rest**: AES-256-GCM for data, RSA-4096 for keys -- **In Transit**: TLS 1.3 minimum, certificate pinning for mobile -- **Database**: Column-level encryption for PII -- **Backups**: Encrypted with separate key management - -### Input Validation -- **Sanitization**: Use parameterized queries, escape HTML -- **File Uploads**: MIME type validation, virus scanning, size limits -- **Rate Limiting**: Per-IP, per-user, per-endpoint limits -- **Schema Validation**: JSON Schema or similar for API inputs - -## API Security - -### Common Vulnerabilities -1. **Injection**: SQL, NoSQL, Command, LDAP injection -2. **Broken Authentication**: Weak passwords, exposed credentials -3. **Sensitive Data Exposure**: Logs, error messages, debug info -4. **XML External Entities**: XXE attacks in XML processing -5. **Broken Access Control**: Privilege escalation, IDOR - -### Security Headers -``` -Content-Security-Policy: default-src 'self' -X-Frame-Options: DENY -X-Content-Type-Options: nosniff -Strict-Transport-Security: max-age=31536000 -Referrer-Policy: strict-origin-when-cross-origin -``` - -## Monitoring & Incident Response - -### Security Logging -- **Authentication Events**: Login attempts, failures, lockouts -- **Authorization**: Access grants/denials, privilege changes -- **Data Access**: PII access, export operations -- **System Changes**: Configuration updates, user modifications - -### Threat Detection -- **Anomaly Detection**: Unusual access patterns, location changes -- **Automated Response**: Account lockout, IP blocking -- **Alert Thresholds**: Failed login attempts, API rate violations -- **SIEM Integration**: Centralized log analysis and correlation diff --git a/docs/claude-integration.md b/docs/claude-integration.md deleted file mode 100644 index 00a6af92..00000000 --- a/docs/claude-integration.md +++ /dev/null @@ -1,263 +0,0 @@ -# BMAD-Method Claude Code Integration - -This document describes the Claude Code subagents integration for BMAD-Method, allowing you to use BMAD's specialized agents within Claude Code's new subagent system. - -## Overview - -The Claude Code integration transforms BMAD's collaborative agent framework into Claude Code subagents while maintaining clean separation from the original codebase. This enables: - -- **Native Claude Code Experience**: Use BMAD agents directly within Claude Code -- **Context Management**: Each agent maintains its own context window -- **Tool Integration**: Leverage Claude Code's built-in tools (Read, Grep, codebase_search, etc.) -- **Workflow Preservation**: Maintain BMAD's proven agent collaboration patterns - -## Quick Setup - -### 1. Prerequisites - -- Node.js 20+ -- Claude Code installed ([claude.ai/code](https://claude.ai/code)) -- Existing BMAD-Method project - -### 2. Generate Claude Subagents - -```bash -# From your BMAD project root -npm run build:claude -``` - -This creates `.claude/agents/` with six specialized subagents: -- **Analyst** (Mary) - Market research, competitive analysis, project briefs -- **Architect** - System design, technical architecture -- **PM** - Project management, planning, coordination -- **Dev** - Development, implementation, coding -- **QA** - Quality assurance, testing, validation -- **Scrum Master** - Agile process management - -### 3. Start Claude Code - -```bash -# In your project root (where .claude/ directory exists) -claude -``` - -## Usage Patterns - -### Explicit Agent Invocation - -Request specific agents for specialized tasks: - -``` -# Market research and analysis -> Use the analyst subagent to help me create a competitive analysis - -# Architecture planning -> Ask the architect subagent to design a microservices architecture - -# Implementation -> Have the dev subagent implement the user authentication system - -# Quality assurance -> Use the qa subagent to create comprehensive test cases -``` - -### Automatic Agent Selection - -Claude Code automatically selects appropriate agents based on context: - -``` -# Analyst will likely be chosen -> I need to research the market for AI-powered project management tools - -# Architect will likely be chosen -> How should I structure the database schema for this multi-tenant SaaS? - -# Dev will likely be chosen -> Implement the JWT authentication middleware -``` - -## Agent Capabilities - -### Analyst (Mary) πŸ“Š -- Market research and competitive analysis -- Project briefs and discovery documentation -- Brainstorming and ideation facilitation -- Strategic analysis and insights - -**Key Commands**: create-project-brief, perform-market-research, create-competitor-analysis, brainstorm - -### Architect πŸ—οΈ -- System architecture and design -- Technical solution planning -- Integration patterns and approaches -- Scalability and performance considerations - -### PM πŸ“‹ -- Project planning and coordination -- Stakeholder management -- Risk assessment and mitigation -- Resource allocation and timeline management - -### Dev πŸ‘¨β€πŸ’» -- Code implementation and development -- Technical problem solving -- Code review and optimization -- Integration and deployment - -### QA πŸ” -- Test planning and execution -- Quality assurance processes -- Bug identification and validation -- Acceptance criteria definition - -### Scrum Master 🎯 -- Sprint planning and management -- Agile process facilitation -- Team coordination and communication -- Impediment resolution - -## Workflow Integration - -### BMAD Story-Driven Development - -Agents can access and work with BMAD story files: - -``` -> Use the dev subagent to implement the user story in stories/user-auth.story.md -``` - -### Task and Template Access - -Agents can read BMAD dependencies: - -``` -> Have the analyst use the project-brief template to document our new feature -``` - -### Cross-Agent Collaboration - -Chain agents for complex workflows: - -``` -> First use the analyst to research the market, then have the architect design the solution, and finally ask the pm to create a project plan -``` - -## Technical Architecture - -### Directory Structure - -``` -./ -β”œβ”€β”€ bmad-core/ # Original BMAD (untouched) -β”œβ”€β”€ integration/claude/ # Claude integration source -└── .claude/ # Generated Claude subagents - β”œβ”€β”€ agents/ # Subagent definitions - β”‚ β”œβ”€β”€ analyst.md - β”‚ β”œβ”€β”€ architect.md - β”‚ └── ... - └── memory/ # Agent context memory -``` - -### Context Management - -- **Lightweight Start**: Each agent begins with minimal context (~2-4KB) -- **On-Demand Loading**: Agents use tools to read files when needed -- **Memory Files**: Rolling memory maintains conversation context -- **Tool Integration**: Access BMAD files via Read, Grep, codebase_search - -### Tool Permissions - -Each agent has access to: -- `Read` - File reading and content access -- `Grep` - Text search within files -- `glob` - File pattern matching -- `codebase_search_agent` - Semantic code search -- `list_directory` - Directory exploration - -## Advanced Usage - -### Custom Agent Development - -To add new agents: - -1. Create agent definition in `bmad-core/agents/new-agent.md` -2. Add agent ID to `integration/claude/src/build-claude.js` -3. Rebuild: `npm run build:claude` - -### Memory Management - -Agents maintain context in `.claude/memory/{agent}.md`: -- Automatically created on first use -- Stores key decisions and context -- Truncated when exceeding limits -- Can be manually edited if needed - -### Integration with CI/CD - -```yaml -# .github/workflows/claude-agents.yml -name: Update Claude Agents -on: - push: - paths: ['bmad-core/agents/**'] -jobs: - build-claude: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v4 - - run: npm run build:claude - - # Commit updated .claude/ directory -``` - -## Best Practices - -### Agent Selection - -- **Analyst**: Early project phases, research, market analysis -- **Architect**: System design, technical planning, solution architecture -- **PM**: Project coordination, planning, stakeholder management -- **Dev**: Implementation, coding, technical execution -- **QA**: Testing, validation, quality assurance -- **Scrum Master**: Process management, team coordination - -### Context Optimization - -- Start conversations with clear agent requests -- Reference specific BMAD files by path when needed -- Use agent memory files for important decisions -- Keep agent contexts focused on their specialization - -### Workflow Efficiency - -- Use explicit agent invocation for specialized tasks -- Chain agents for multi-phase work -- Leverage BMAD story files for development context -- Maintain conversation history in agent memory - -## Troubleshooting - -### Agent Not Found -```bash -# Rebuild agents -npm run build:claude - -# Verify generation -ls .claude/agents/ -``` - -### Memory Issues -```bash -# Clear agent memory -rm .claude/memory/*.md -``` - -### Context Problems -- Keep agent prompts focused -- Use tools to load files on-demand -- Reference specific sections rather than entire documents - -## Support - -- **BMAD Community**: [Discord](https://discord.gg/gk8jAdXWmj) -- **Issues**: [GitHub Issues](https://github.com/24601/BMAD-AT-CLAUDE/issues) -- **Claude Code Docs**: [docs.anthropic.com/claude-code](https://docs.anthropic.com/en/docs/claude-code/overview) diff --git a/integration/claude/.gitignore b/integration/claude/.gitignore deleted file mode 100644 index 948957c7..00000000 --- a/integration/claude/.gitignore +++ /dev/null @@ -1,13 +0,0 @@ -# Dependencies -node_modules/ -npm-debug.log* -yarn-debug.log* -yarn-error.log* - -# Runtime data -*.pid -*.seed -*.log - -# Generated files should be in root .claude/, not here -.claude/ diff --git a/integration/claude/README.md b/integration/claude/README.md deleted file mode 100644 index 5595c811..00000000 --- a/integration/claude/README.md +++ /dev/null @@ -1,105 +0,0 @@ -# BMAD-Method Claude Code Integration - -This directory contains the integration layer that ports BMAD-Method agents to Claude Code's subagent system. - -## Quick Start - -```bash -# Build Claude Code subagents from BMAD definitions -npm run build - -# Start Claude Code in the repo root -cd ../../ -claude -``` - -## What This Does - -This integration transforms BMAD-Method's specialized agents into Claude Code subagents: - -- **Analyst (Mary)** - Market research, brainstorming, competitive analysis, project briefs -- **Architect** - System design, technical architecture, solution planning -- **PM** - Project management, planning, coordination -- **Dev** - Development, implementation, coding -- **QA** - Quality assurance, testing, validation -- **Scrum Master** - Agile process management, team coordination - -## How It Works - -1. **Agent Parsing**: Reads BMAD agent definitions from `bmad-core/agents/` -2. **Template Generation**: Uses Mustache templates to create Claude subagent files -3. **Context Management**: Creates lightweight memory files for each agent -4. **Tool Assignment**: Grants appropriate tools (Read, Grep, codebase_search, etc.) - -## Generated Structure - -``` -.claude/ -β”œβ”€β”€ agents/ # Generated subagent definitions -β”‚ β”œβ”€β”€ analyst.md -β”‚ β”œβ”€β”€ architect.md -β”‚ β”œβ”€β”€ dev.md -β”‚ β”œβ”€β”€ pm.md -β”‚ β”œβ”€β”€ qa.md -β”‚ └── sm.md -└── memory/ # Context memory for each agent - β”œβ”€β”€ analyst.md - └── ... -``` - -## Usage in Claude Code - -Once built, you can use subagents in Claude Code: - -``` -# Explicit invocation -> Use the analyst subagent to help me create a project brief - -# Or let Claude choose automatically -> I need help with market research and competitive analysis -``` - -## Architecture Principles - -- **Zero Pollution**: No changes to original BMAD structure -- **One-Way Generation**: Claude agents generated from BMAD sources -- **Context Light**: Each agent starts with minimal context, loads more on-demand -- **Tool Focused**: Uses Claude Code's built-in tools for file access - -## Development - -### Building - -```bash -npm run build # Build all agents -npm run clean # Remove generated .claude directory -npm run validate # Validate agent definitions -``` - -### Templates - -Agent templates are in `src/templates/agent.mustache` and use the following data: - -- `agent.*` - Agent metadata (name, title, icon, etc.) -- `persona.*` - Role definition and principles -- `commands` - Available BMAD commands -- `dependencies.*` - Task, template, and data dependencies - -### Adding New Agents - -1. Add agent ID to `CORE_AGENTS` array in `build-claude.js` -2. Ensure corresponding `.md` file exists in `bmad-core/agents/` -3. Run `npm run build` - -## Integration with Original BMAD - -This integration is designed to coexist with the original BMAD system: - -- Original BMAD web bundles continue to work unchanged -- Claude integration is completely optional -- No modification to core BMAD files required -- Can be used alongside existing BMAD workflows - -## License - -MIT - Same as BMAD-Method diff --git a/integration/claude/TESTING.md b/integration/claude/TESTING.md deleted file mode 100644 index a8f8f4cb..00000000 --- a/integration/claude/TESTING.md +++ /dev/null @@ -1,437 +0,0 @@ -# End-to-End Testing Guide for BMAD Claude Integration - -This guide provides comprehensive testing scenarios to validate the Claude Code subagents integration. - -## Test Environment Setup - -### 1. Create Fresh Test Project - -```bash -# Create new test directory -mkdir ~/bmad-claude-test -cd ~/bmad-claude-test - -# Initialize basic project structure -mkdir -p src docs tests -echo "# Test Project for BMAD Claude Integration" > README.md - -# Clone BMAD method (or copy existing) -git clone https://github.com/24601/BMAD-AT-CLAUDE.git -cd BMAD-AT-CLAUDE - -# Install dependencies and build Claude agents -npm install -npm run build:claude -``` - -### 2. Verify Claude Code Installation - -```bash -# Check Claude Code is available -claude --version - -# Verify we're in the right directory with .claude/agents/ -ls -la .claude/agents/ -``` - -### 3. Start Claude Code Session - -```bash -# Start Claude Code in project root -claude - -# Should show available subagents -/agents -``` - -## Core Agent Testing - -### Test 1: Analyst Agent - Market Research - -**Prompt:** -``` -Use the analyst subagent to help me research the market for AI-powered project management tools. I want to understand the competitive landscape and identify key market gaps. -``` - -**Expected Behavior:** -- Agent introduces itself as Mary, Business Analyst -- Offers to use market research templates -- Accesses BMAD dependencies using Read tool -- Provides structured analysis approach - -**Validation:** -- [ ] Agent stays in character as Mary -- [ ] References BMAD templates/tasks appropriately -- [ ] Uses numbered lists for options -- [ ] Accesses files via Read tool when needed - -### Test 2: Architect Agent - System Design - -**Prompt:** -``` -Ask the architect subagent to design a microservices architecture for a multi-tenant SaaS platform with user authentication, billing, and analytics. -``` - -**Expected Behavior:** -- Agent focuses on technical architecture -- Considers scalability and system boundaries -- May reference BMAD architecture templates -- Provides detailed technical recommendations - -**Validation:** -- [ ] Technical depth appropriate for architect role -- [ ] System thinking and architectural patterns -- [ ] References to BMAD resources when relevant - -### Test 3: Dev Agent - Implementation - -**Prompt:** -``` -Have the dev subagent implement a JWT authentication middleware in Node.js with proper error handling and logging. -``` - -**Expected Behavior:** -- Focuses on practical implementation -- Writes actual code -- Considers best practices and error handling -- May suggest testing approaches - -**Validation:** -- [ ] Produces working code -- [ ] Follows security best practices -- [ ] Includes proper error handling - -## BMAD Integration Testing - -### Test 4: Story File Workflow - -**Setup:** -```bash -# Create a sample story file -mkdir -p stories -cat > stories/user-auth.story.md << 'EOF' -# User Authentication Story - -## Overview -Implement secure user authentication system with JWT tokens. - -## Acceptance Criteria -- [ ] User can register with email/password -- [ ] User can login and receive JWT token -- [ ] Protected routes require valid token -- [ ] Token refresh mechanism - -## Technical Notes -- Use bcrypt for password hashing -- JWT expiry: 15 minutes -- Refresh token expiry: 7 days -EOF -``` - -**Prompt:** -``` -Use the dev subagent to implement the user authentication story in stories/user-auth.story.md. Follow the acceptance criteria exactly. -``` - -**Expected Behavior:** -- Agent reads the story file using Read tool -- Implements according to acceptance criteria -- References story context throughout implementation - -**Validation:** -- [ ] Agent reads story file correctly -- [ ] Implementation matches acceptance criteria -- [ ] Maintains story context during conversation - -### Test 5: BMAD Template Usage - -**Prompt:** -``` -Use the analyst subagent to create a project brief using the BMAD project-brief template for an AI-powered customer support chatbot. -``` - -**Expected Behavior:** -- Agent accesses BMAD templates using Read tool -- Uses project-brief-tmpl.yaml structure -- Guides user through template completion -- Follows BMAD workflow patterns - -**Validation:** -- [ ] Accesses correct template file -- [ ] Follows template structure -- [ ] Maintains BMAD methodology - -## Agent Collaboration Testing - -### Test 6: Multi-Agent Workflow - -**Prompt:** -``` -I want to build a new feature for real-time notifications. First use the analyst to research notification patterns, then have the architect design the system, and finally ask the pm to create a project plan. -``` - -**Expected Behavior:** -- Sequential agent handoffs -- Each agent maintains context from previous work -- Cross-references between agent outputs -- Coherent end-to-end workflow - -**Validation:** -- [ ] Smooth agent transitions -- [ ] Context preservation across agents -- [ ] Workflow coherence -- [ ] Each agent stays in character - -### Test 7: Agent Memory Persistence - -**Setup:** -```bash -# Start conversation with analyst -# Make some decisions and progress -# Exit and restart Claude Code session -``` - -**Test:** -1. Have conversation with analyst about market research -2. Exit Claude Code -3. Restart Claude Code -4. Continue conversation - check if context preserved - -**Expected Behavior:** -- Agent memory files store key decisions -- Context partially preserved across sessions -- Agent references previous conversation appropriately - -## Error Handling and Edge Cases - -### Test 8: Invalid File Access - -**Prompt:** -``` -Use the analyst subagent to read the file bmad-core/nonexistent-file.md -``` - -**Expected Behavior:** -- Graceful error handling -- Suggests alternative files or approaches -- Maintains agent persona during error - -**Validation:** -- [ ] No crashes or errors -- [ ] Helpful error messages -- [ ] Agent stays in character - -### Test 9: Tool Permission Testing - -**Prompt:** -``` -Use the dev subagent to create a new file in the src/ directory with a sample API endpoint. -``` - -**Expected Behavior:** -- Agent attempts to use available tools -- If create_file not available, suggests alternatives -- Provides code that could be manually created - -**Validation:** -- [ ] Respects tool limitations -- [ ] Provides alternatives when tools unavailable -- [ ] Clear about what actions are possible - -### Test 10: Context Window Management - -**Setup:** -```bash -# Create large content files to test context limits -mkdir -p test-content -for i in {1..50}; do - echo "This is test content line $i with enough text to make it substantial and test context window management capabilities. Adding more text to make each line longer and test how agents handle large content volumes." >> test-content/large-file.md -done -``` - -**Prompt:** -``` -Use the analyst subagent to analyze all the content in the test-content/ directory and summarize the key insights. -``` - -**Expected Behavior:** -- Agent uses tools to access content incrementally -- Doesn't load everything into context at once -- Provides meaningful analysis despite size constraints - -**Validation:** -- [ ] Efficient tool usage -- [ ] No context overflow errors -- [ ] Meaningful output despite constraints - -## Performance and Usability Testing - -### Test 11: Response Time - -**Test Multiple Prompts:** -- Time each agent invocation -- Measure response quality vs speed -- Test with different complexity levels - -**Metrics:** -- [ ] Initial agent load time < 10 seconds -- [ ] Subsequent responses < 30 seconds -- [ ] Quality maintained across response times - -### Test 12: User Experience - -**Prompts to Test:** -``` -# Ambiguous request -> Help me with my project - -# Complex multi-step request -> I need to build a complete authentication system from scratch - -# Domain-specific request -> Create unit tests for my React components -``` - -**Expected Behavior:** -- Appropriate agent selection or clarification requests -- Clear guidance on next steps -- Professional communication - -**Validation:** -- [ ] Appropriate agent routing -- [ ] Clear communication -- [ ] Helpful responses to ambiguous requests - -## Validation Checklist - -### Agent Behavior βœ… -- [ ] Each agent maintains distinct persona -- [ ] Agents stay in character throughout conversations -- [ ] Appropriate expertise demonstrated -- [ ] BMAD methodology preserved - -### Tool Integration βœ… -- [ ] Read tool accesses BMAD files correctly -- [ ] Grep searches work across codebase -- [ ] codebase_search_agent provides relevant results -- [ ] File paths resolved correctly - -### Context Management βœ… -- [ ] Agents start with minimal context -- [ ] On-demand loading works properly -- [ ] Memory files created and maintained -- [ ] No context overflow errors - -### BMAD Integration βœ… -- [ ] Original BMAD workflows preserved -- [ ] Templates and tasks accessible -- [ ] Story-driven development supported -- [ ] Cross-agent collaboration maintained - -### Error Handling βœ… -- [ ] Graceful handling of missing files -- [ ] Clear error messages -- [ ] Recovery suggestions provided -- [ ] No system crashes - -## Automated Testing Script - -```bash -#!/bin/bash -# automated-test.sh - -echo "πŸš€ Starting BMAD Claude Integration Tests..." - -# Test 1: Build verification -echo "πŸ“‹ Test 1: Build verification" -npm run build:claude -if [ $? -eq 0 ]; then - echo "βœ… Build successful" -else - echo "❌ Build failed" - exit 1 -fi - -# Test 2: Agent file validation -echo "πŸ“‹ Test 2: Agent file validation" -cd integration/claude -npm run validate -if [ $? -eq 0 ]; then - echo "βœ… Validation successful" -else - echo "❌ Validation failed" - exit 1 -fi - -# Test 3: File structure verification -echo "πŸ“‹ Test 3: File structure verification" -cd ../.. -required_files=( - ".claude/agents/analyst.md" - ".claude/agents/architect.md" - ".claude/agents/dev.md" - ".claude/agents/pm.md" - ".claude/agents/qa.md" - ".claude/agents/sm.md" -) - -for file in "${required_files[@]}"; do - if [ -f "$file" ]; then - echo "βœ… $file exists" - else - echo "❌ $file missing" - exit 1 - fi -done - -echo "πŸŽ‰ All automated tests passed!" -echo "πŸ“ Manual testing required for agent conversations" -``` - -## Manual Test Report Template - -```markdown -# BMAD Claude Integration Test Report - -**Date:** ___________ -**Tester:** ___________ -**Claude Code Version:** ___________ - -## Test Results Summary -- [ ] All agents load successfully -- [ ] Agent personas maintained -- [ ] BMAD integration working -- [ ] Tool access functional -- [ ] Error handling appropriate - -## Detailed Results - -### Agent Tests -- [ ] Analyst: βœ…/❌ - Notes: ___________ -- [ ] Architect: βœ…/❌ - Notes: ___________ -- [ ] Dev: βœ…/❌ - Notes: ___________ -- [ ] PM: βœ…/❌ - Notes: ___________ -- [ ] QA: βœ…/❌ - Notes: ___________ -- [ ] SM: βœ…/❌ - Notes: ___________ - -### Integration Tests -- [ ] Story workflow: βœ…/❌ -- [ ] Template usage: βœ…/❌ -- [ ] Multi-agent flow: βœ…/❌ - -### Issues Found -1. ___________ -2. ___________ -3. ___________ - -## Recommendations -___________ -``` - -## Next Steps After Testing - -1. **Fix Issues**: Address any problems found during testing -2. **Performance Optimization**: Improve response times if needed -3. **Documentation Updates**: Clarify usage based on test learnings -4. **User Feedback**: Gather feedback from real users -5. **Iteration**: Refine agents based on testing results diff --git a/integration/claude/complete-test-framework.md b/integration/claude/complete-test-framework.md deleted file mode 100644 index 4ea408e9..00000000 --- a/integration/claude/complete-test-framework.md +++ /dev/null @@ -1,254 +0,0 @@ -# Complete End-to-End Testing Framework with o3 Judge - -Based on the Oracle's detailed evaluation, here's the comprehensive testing approach for validating the BMAD Claude integration. - -## Testing Strategy Overview - -1. **Manual Execution**: Run tests manually in Claude Code to avoid timeout issues -2. **Structured Collection**: Capture responses in standardized format -3. **o3 Evaluation**: Use Oracle tool for sophisticated analysis -4. **Iterative Improvement**: Apply recommendations to enhance integration - -## Test Suite - -### Core Agent Tests - -#### 1. Analyst Agent - Market Research -**Prompt:** -``` -Use the analyst subagent to help me research the competitive landscape for AI project management tools. -``` - -**Evaluation Criteria (from o3 analysis):** -- Subagent Persona (Mary, Business Analyst): 0-5 points -- Analytical Expertise/Market Research Method: 0-5 points -- BMAD Methodology Integration: 0-5 points -- Response Structure & Professionalism: 0-5 points -- User Engagement/Next-Step Clarity: 0-5 points - -**Expected Improvements (per o3 recommendations):** -- [ ] References specific BMAD artefacts (Opportunity Scorecard, Gap Matrix) -- [ ] Includes quantitative analysis with data sources -- [ ] Shows hypothesis-driven discovery approach -- [ ] Solicits clarification on scope and constraints - -#### 2. Dev Agent - Implementation Quality -**Prompt:** -``` -Have the dev subagent implement a secure file upload endpoint in Node.js with validation, virus scanning, and rate limiting. -``` - -**Evaluation Criteria:** -- Technical Implementation Quality: 0-5 points -- Security Best Practices: 0-5 points -- Code Structure and Documentation: 0-5 points -- Error Handling and Validation: 0-5 points -- BMAD Story Integration: 0-5 points - -#### 3. Architect Agent - System Design -**Prompt:** -``` -Ask the architect subagent to design a microservices architecture for a real-time collaboration platform with document editing, user presence, and conflict resolution. -``` - -**Evaluation Criteria:** -- System Architecture Expertise: 0-5 points -- Scalability and Performance Considerations: 0-5 points -- Real-time Architecture Patterns: 0-5 points -- Technical Detail and Accuracy: 0-5 points -- Integration with BMAD Architecture Templates: 0-5 points - -#### 4. PM Agent - Project Planning -**Prompt:** -``` -Use the pm subagent to create a project plan for launching a new AI-powered feature, including team coordination, risk management, and stakeholder communication. -``` - -**Evaluation Criteria:** -- Project Management Methodology: 0-5 points -- Risk Assessment and Mitigation: 0-5 points -- Timeline and Resource Planning: 0-5 points -- Stakeholder Management: 0-5 points -- BMAD Process Integration: 0-5 points - -#### 5. QA Agent - Testing Strategy -**Prompt:** -``` -Ask the qa subagent to design a comprehensive testing strategy for a fintech payment processing system, including security, compliance, and performance testing. -``` - -**Evaluation Criteria:** -- Testing Methodology Depth: 0-5 points -- Domain-Specific Considerations (Fintech): 0-5 points -- Test Automation and CI/CD Integration: 0-5 points -- Quality Assurance Best Practices: 0-5 points -- BMAD QA Template Usage: 0-5 points - -#### 6. Scrum Master Agent - Process Facilitation -**Prompt:** -``` -Use the sm subagent to help establish an agile workflow for a remote team, including sprint ceremonies, collaboration tools, and team dynamics. -``` - -**Evaluation Criteria:** -- Agile Methodology Expertise: 0-5 points -- Remote Team Considerations: 0-5 points -- Process Facilitation Skills: 0-5 points -- Tool and Workflow Recommendations: 0-5 points -- BMAD Agile Integration: 0-5 points - -### Advanced Integration Tests - -#### 7. BMAD Story Workflow -**Setup:** -```bash -# Create sample story file -cat > stories/payment-integration.story.md << 'EOF' -# Payment Integration Story - -## Overview -Integrate Stripe payment processing for subscription billing - -## Acceptance Criteria -- [ ] Secure payment form with validation -- [ ] Subscription creation and management -- [ ] Webhook handling for payment events -- [ ] Error handling and retry logic -- [ ] Compliance with PCI DSS requirements - -## Technical Notes -- Use Stripe SDK v3 -- Implement idempotency keys -- Log all payment events for audit -EOF -``` - -**Test Prompt:** -``` -Use the dev subagent to implement the payment integration story in stories/payment-integration.story.md -``` - -**Evaluation Focus:** -- Story comprehension and implementation -- Acceptance criteria coverage -- BMAD story-driven development adherence - -#### 8. Cross-Agent Collaboration -**Test Sequence:** -``` -1. "Use the analyst subagent to research payment processing competitors" -2. "Now ask the architect subagent to design a payment system based on the analysis" -3. "Have the pm subagent create an implementation plan for the payment system" -``` - -**Evaluation Focus:** -- Context handoff between agents -- Building on previous agent outputs -- Coherent multi-agent workflow - -## Testing Execution Process - -### Step 1: Manual Execution -```bash -# Build agents -npm run build:claude - -# Start Claude Code -claude - -# Run each test prompt and save responses -``` - -### Step 2: Response Collection -Create a structured record for each test: - -```json -{ - "testId": "analyst-market-research", - "timestamp": "2025-07-24T...", - "prompt": "Use the analyst subagent...", - "response": "Hello! I'm Mary...", - "executionNotes": "Agent responded immediately, showed subagent behavior", - "evidenceFound": [ - "Agent identified as Mary", - "Referenced BMAD template", - "Structured analysis approach" - ] -} -``` - -### Step 3: o3 Evaluation -For each response, use the Oracle tool with this evaluation template: - -``` -Evaluate this Claude Code subagent response using the detailed criteria framework established for BMAD integration testing. - -TEST: {testId} -ORIGINAL PROMPT: {prompt} -RESPONSE: {response} - -EVALUATION FRAMEWORK: -[Insert specific 5-point criteria for the agent type] - -Based on the previous detailed evaluation of the analyst agent, please provide: - -1. DETAILED SCORES: Rate each criterion 0-5 with justification -2. OVERALL PERCENTAGE: Calculate weighted average (max 100%) -3. STRENGTHS: What shows excellent subagent behavior? -4. IMPROVEMENT AREAS: What needs enhancement? -5. BMAD INTEGRATION LEVEL: none/basic/good/excellent -6. RECOMMENDATIONS: Specific improvements aligned with BMAD methodology -7. PASS/FAIL: Does this meet minimum subagent behavior threshold (70%)? - -Format as structured analysis similar to the previous detailed evaluation. -``` - -### Step 4: Report Generation - -#### Individual Test Reports -For each test, generate: -- Score breakdown by criteria -- Evidence of subagent behavior -- BMAD integration assessment -- Specific recommendations - -#### Aggregate Analysis -- Overall pass rate across all agents -- BMAD integration maturity assessment -- Common strengths and improvement areas -- Integration readiness evaluation - -## Success Criteria - -### Minimum Viable Integration (70% threshold) -- [ ] Agents demonstrate distinct personas -- [ ] Responses show appropriate domain expertise -- [ ] Basic BMAD methodology references -- [ ] Professional response structure -- [ ] Clear user engagement - -### Excellent Integration (85%+ threshold) -- [ ] Deep BMAD artifact integration -- [ ] Quantitative analysis with data sources -- [ ] Hypothesis-driven approach -- [ ] Sophisticated domain expertise -- [ ] Seamless cross-agent collaboration - -## Continuous Improvement Process - -1. **Run Full Test Suite** - Execute all 8 core tests -2. **Oracle Evaluation** - Get detailed o3 analysis for each -3. **Identify Patterns** - Find common improvement areas -4. **Update Agent Prompts** - Enhance based on recommendations -5. **Rebuild and Retest** - Verify improvements -6. **Document Learnings** - Update integration best practices - -## Automation Opportunities - -Once manual process is validated: -- Automated response collection via Claude API -- Batch o3 evaluation processing -- Regression testing on agent updates -- Performance benchmarking over time - -This framework provides the sophisticated evaluation approach demonstrated by the Oracle's analysis while remaining practical for ongoing validation and improvement of the BMAD Claude integration. diff --git a/integration/claude/manual-test-guide.md b/integration/claude/manual-test-guide.md deleted file mode 100644 index 0ce3d5b1..00000000 --- a/integration/claude/manual-test-guide.md +++ /dev/null @@ -1,115 +0,0 @@ -# Manual Testing Guide with o3 Judge - -Since automated Claude testing can be complex due to session management, here's a comprehensive manual testing approach with o3 evaluation. - -## Quick Manual Test Process - -### 1. Setup Test Environment - -```bash -# Ensure agents are built -npm run build:claude - -# Verify agent files exist -ls .claude/agents/ - -# Start Claude Code -claude -``` - -### 2. Test Each Agent Manually - -Run these prompts in Claude Code and copy the responses for evaluation: - -#### Test 1: Analyst Agent -**Prompt:** -``` -Use the analyst subagent to help me research the competitive landscape for AI project management tools. -``` - -**Expected Behaviors:** -- Agent identifies as Mary or Business Analyst -- Shows analytical methodology or structured approach -- References market research or competitive analysis expertise -- May mention BMAD templates or systematic workflow - -#### Test 2: Dev Agent -**Prompt:** -``` -Have the dev subagent implement a JWT authentication middleware with error handling. -``` - -**Expected Behaviors:** -- Provides actual code implementation -- Shows development expertise and best practices -- Includes proper error handling approach -- Demonstrates security awareness for JWT - -#### Test 3: Architect Agent -**Prompt:** -``` -Ask the architect subagent to design a microservices architecture for real-time notifications. -``` - -**Expected Behaviors:** -- Shows system architecture expertise -- Discusses microservices patterns and boundaries -- Considers real-time and scalability concerns -- Demonstrates technical depth appropriate for architect role - -### 3. Record Results - -For each test, record: -- Did the response come from the intended subagent? -- What evidence shows proper agent behavior? -- How well does it integrate BMAD methodology? -- Quality and relevance of the response - -## Example Test Result for o3 Evaluation - -Here's a sample response to demonstrate the evaluation process: - -**Test Response (Analyst Agent):** -``` -Hello! I'm Mary, your Business Analyst, and I'm excited to help you research the competitive landscape for AI project management tools. - -Let me approach this systematically using proven market research methodology: - -## Research Approach - -1. **Competitive Analysis Framework** - - Direct competitors (AI-first PM tools) - - Indirect competitors (traditional PM tools adding AI) - - Emerging players and startups - -2. **Key Analysis Dimensions** - - Feature sets and AI capabilities - - Target markets and pricing - - User experience and adoption - - Technology stack and integrations - -## Initial Competitive Landscape - -### Direct AI-First Competitors -- **Monday.com AI**: Native AI features for task automation -- **Asana Intelligence**: AI-powered insights and automation -- **ClickUp AI**: AI writing assistant and task optimization - -### Traditional Players with AI -- **Microsoft Project**: Adding Copilot integration -- **Atlassian (Jira)**: Intelligence features and automation -- **Notion**: AI writing and organization features - -Would you like me to use the BMAD market research template to create a more detailed competitive analysis? I can help you: - -1. Create a comprehensive competitor analysis document -2. Perform detailed feature mapping -3. Identify market gaps and opportunities -4. Research pricing and positioning strategies - -Which aspect would you like to explore first? -``` - -## Using o3 Judge for Evaluation - -I'll now use the Oracle (o3) to evaluate this sample response: diff --git a/integration/claude/package.json b/integration/claude/package.json deleted file mode 100644 index e4ab144f..00000000 --- a/integration/claude/package.json +++ /dev/null @@ -1,38 +0,0 @@ -{ - "name": "@bmad/claude-integration", - "version": "1.0.0", - "description": "Claude Code subagents integration for BMAD-Method", - "type": "module", - "scripts": { - "build": "node src/build-claude.js", - "build:agents": "node src/build-claude.js", - "clean": "rm -rf ../../.claude", - "validate": "node src/validate.js" - }, - "dependencies": { - "mustache": "^4.2.0", - "yaml": "^2.3.4", - "fs-extra": "^11.2.0" - }, - "devDependencies": { - "@types/node": "^20.0.0", - "typescript": "^5.0.0" - }, - "peerDependencies": { - "bmad-method": "*" - }, - "keywords": [ - "bmad", - "claude", - "ai-agents", - "subagents", - "anthropic" - ], - "author": "BMAD Community", - "license": "MIT", - "repository": { - "type": "git", - "url": "https://github.com/24601/BMAD-AT-CLAUDE.git", - "directory": "integration/claude" - } -} diff --git a/integration/claude/quick-start-test.sh b/integration/claude/quick-start-test.sh deleted file mode 100755 index ad941518..00000000 --- a/integration/claude/quick-start-test.sh +++ /dev/null @@ -1,147 +0,0 @@ -#!/bin/bash - -# Quick Start Test for BMAD Claude Integration -# Provides simple validation and setup for manual testing with o3 judge - -echo "πŸš€ BMAD Claude Integration - Quick Start Test" -echo "=============================================" - -# Colors -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -RED='\033[0;31m' -BLUE='\033[0;34m' -NC='\033[0m' # No Color - -# Change to repo root -cd "$(dirname "$0")/../.." - -echo -e "${BLUE}πŸ“‚ Working directory: $(pwd)${NC}" -echo "" - -# Check prerequisites -echo "πŸ” Checking prerequisites..." - -# Check Node.js -if command -v node &> /dev/null; then - NODE_VERSION=$(node --version) - echo -e "${GREEN}βœ… Node.js ${NODE_VERSION}${NC}" -else - echo -e "${RED}❌ Node.js not found${NC}" - exit 1 -fi - -# Check Claude Code -if command -v claude &> /dev/null; then - CLAUDE_VERSION=$(claude --version 2>&1 | head -1) - echo -e "${GREEN}βœ… Claude Code detected${NC}" -else - echo -e "${YELLOW}⚠️ Claude Code not found${NC}" - echo " Install from: https://claude.ai/code" -fi - -# Check if agents are built -if [ -d ".claude/agents" ]; then - AGENT_COUNT=$(ls .claude/agents/*.md 2>/dev/null | wc -l) - echo -e "${GREEN}βœ… Found ${AGENT_COUNT} agent files${NC}" -else - echo -e "${YELLOW}⚠️ No agents found - building them now...${NC}" - npm run build:claude - if [ $? -eq 0 ]; then - echo -e "${GREEN}βœ… Agents built successfully${NC}" - else - echo -e "${RED}❌ Failed to build agents${NC}" - exit 1 - fi -fi - -# Validate agent files -echo "" -echo "πŸ” Validating agent configurations..." -cd integration/claude -npm run validate > /dev/null 2>&1 -if [ $? -eq 0 ]; then - echo -e "${GREEN}βœ… All agent configurations valid${NC}" -else - echo -e "${YELLOW}⚠️ Agent validation warnings (check with: npm run validate)${NC}" -fi -cd ../.. - -# Show available agents -echo "" -echo "🎭 Available BMAD Agents:" -for agent in .claude/agents/*.md; do - if [ -f "$agent" ]; then - AGENT_NAME=$(basename "$agent" .md) - AGENT_TITLE=$(grep "^name:" "$agent" | cut -d: -f2- | sed 's/^ *//') - echo -e "${BLUE} πŸ“‹ ${AGENT_NAME}: ${AGENT_TITLE}${NC}" - fi -done - -# Create test commands -echo "" -echo "πŸ§ͺ Quick Test Commands:" -echo "======================" - -cat << 'EOF' - -1. Start Claude Code: - claude - -2. Test Analyst Agent: - Use the analyst subagent to help me research the competitive landscape for AI project management tools. - -3. Test Dev Agent: - Have the dev subagent implement a JWT authentication middleware with error handling. - -4. Test Architect Agent: - Ask the architect subagent to design a microservices architecture for real-time notifications. - -5. Check Available Agents: - /agents - -EOF - -# Provide next steps -echo "" -echo -e "${GREEN}🎯 Next Steps for Complete Testing:${NC}" -echo "1. Run the manual test commands above in Claude Code" -echo "2. Copy responses and use Oracle tool for o3 evaluation" -echo "3. See complete-test-framework.md for comprehensive testing" -echo "4. Use manual-test-guide.md for detailed evaluation criteria" - -# Check if we can run a basic file test -echo "" -echo "πŸ”¬ Basic File Structure Test:" -if [ -f ".claude/agents/analyst.md" ]; then - # Check if analyst file has expected content - if grep -q "Mary" ".claude/agents/analyst.md"; then - echo -e "${GREEN}βœ… Analyst agent properly configured${NC}" - else - echo -e "${YELLOW}⚠️ Analyst agent may need reconfiguration${NC}" - fi - - if grep -q "bmad-core" ".claude/agents/analyst.md"; then - echo -e "${GREEN}βœ… BMAD integration references present${NC}" - else - echo -e "${YELLOW}⚠️ Limited BMAD integration detected${NC}" - fi -else - echo -e "${RED}❌ Analyst agent file not found${NC}" -fi - -# Summary -echo "" -echo -e "${GREEN}πŸŽ‰ Setup Complete!${NC}" -echo "" -if command -v claude &> /dev/null; then - echo -e "${GREEN}Ready to test! Run: ${BLUE}claude${GREEN} to start testing.${NC}" -else - echo -e "${YELLOW}Install Claude Code first, then run: ${BLUE}claude${NC}" -fi - -echo "" -echo "πŸ“š Testing Resources:" -echo " πŸ“– integration/claude/complete-test-framework.md" -echo " πŸ“‹ integration/claude/manual-test-guide.md" -echo " πŸ”§ integration/claude/TESTING.md" diff --git a/integration/claude/quick-test.sh b/integration/claude/quick-test.sh deleted file mode 100755 index 1b97a315..00000000 --- a/integration/claude/quick-test.sh +++ /dev/null @@ -1,108 +0,0 @@ -#!/bin/bash - -# Quick End-to-End Test for BMAD Claude Integration -echo "πŸš€ BMAD Claude Integration - Quick Test" -echo "======================================" - -# Colors for output -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -NC='\033[0m' # No Color - -# Test counter -TESTS=0 -PASSED=0 - -run_test() { - local test_name="$1" - local test_command="$2" - - echo -e "\nπŸ“‹ Test $((++TESTS)): $test_name" - - if eval "$test_command"; then - echo -e "${GREEN}βœ… PASSED${NC}" - ((PASSED++)) - else - echo -e "${RED}❌ FAILED${NC}" - fi -} - -# Navigate to repo root -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -cd "$SCRIPT_DIR/../.." - -echo "Working directory: $(pwd)" -echo "Files in .claude/agents/:" -ls -la .claude/agents/ 2>/dev/null || echo "No .claude/agents directory found" -echo "" - -# Test 1: Dependencies check -run_test "Node.js version check" "node --version | grep -E 'v[2-9][0-9]|v1[89]|v[2-9][0-9]'" - -# Test 2: Build agents -run_test "Build Claude agents" "npm run build:claude > /dev/null 2>&1" - -# Test 3: Validate agent files exist -run_test "Agent files exist" "ls .claude/agents/analyst.md .claude/agents/architect.md .claude/agents/dev.md .claude/agents/pm.md .claude/agents/qa.md .claude/agents/sm.md > /dev/null 2>&1" - -# Test 4: Validate agent file structure -run_test "Agent file structure valid" "cd integration/claude && npm run validate > /dev/null 2>&1" - -# Test 5: Check YAML frontmatter -run_test "Analyst YAML frontmatter" "test -f .claude/agents/analyst.md && cat .claude/agents/analyst.md | grep -q 'name: Mary'" - -# Test 6: Check agent content -run_test "Agent persona content" "test -f .claude/agents/analyst.md && cat .claude/agents/analyst.md | grep -q 'You are Mary'" - -# Test 7: Check BMAD dependencies listed -run_test "BMAD dependencies listed" "test -f .claude/agents/analyst.md && cat .claude/agents/analyst.md | grep -q 'bmad-core'" - -# Test 8: Memory files created -run_test "Memory files created" "ls .claude/memory/*.md > /dev/null 2>&1" - -# Test 9: Claude Code available (optional) -if command -v claude &> /dev/null; then - run_test "Claude Code available" "claude --version > /dev/null 2>&1" - CLAUDE_AVAILABLE=true -else - echo -e "\n⚠️ Claude Code not installed - skipping CLI tests" - echo " Install from: https://claude.ai/code" - CLAUDE_AVAILABLE=false -fi - -# Summary -echo "" -echo "======================================" -echo -e "πŸ“Š Test Results: ${GREEN}$PASSED${NC}/$TESTS tests passed" - -if [ $PASSED -eq $TESTS ]; then - echo -e "${GREEN}πŸŽ‰ All tests passed!${NC}" - - if [ "$CLAUDE_AVAILABLE" = true ]; then - echo "" - echo "πŸš€ Ready for manual testing!" - echo "" - echo "Next steps:" - echo "1. Run: claude" - echo "2. Try: /agents" - echo "3. Test: 'Use the analyst subagent to help me create a project brief'" - echo "" - echo "See integration/claude/TESTING.md for comprehensive test scenarios" - else - echo "" - echo "⚠️ Install Claude Code to complete testing:" - echo " https://claude.ai/code" - fi - - exit 0 -else - echo -e "${RED}❌ Some tests failed${NC}" - echo "" - echo "Check the following:" - echo "- Node.js version >= 18" - echo "- npm dependencies installed" - echo "- BMAD core files present" - - exit 1 -fi diff --git a/integration/claude/run-judge-test.js b/integration/claude/run-judge-test.js deleted file mode 100755 index c3811821..00000000 --- a/integration/claude/run-judge-test.js +++ /dev/null @@ -1,223 +0,0 @@ -#!/usr/bin/env node - -/** - * Real o3 Judge Integration for Claude Subagent Testing - * This version integrates with Amp's Oracle tool for real o3 evaluation - */ - -import { execSync } from 'child_process'; -import fs from 'fs-extra'; -import path from 'path'; -import { fileURLToPath } from 'url'; - -const __filename = fileURLToPath(import.meta.url); -const __dirname = path.dirname(__filename); -const REPO_ROOT = path.resolve(__dirname, '../..'); - -// Simplified test cases for real o3 evaluation -const CORE_TESTS = [ - { - id: 'analyst-basic-behavior', - prompt: 'Use the analyst subagent to help me research the competitive landscape for AI project management tools.', - expectedEvidence: [ - 'Agent identifies as Mary or Business Analyst', - 'Shows analytical methodology or structured approach', - 'References market research or competitive analysis expertise', - 'May mention BMAD templates or systematic workflow' - ] - }, - { - id: 'dev-implementation-test', - prompt: 'Have the dev subagent implement a JWT authentication middleware with error handling.', - expectedEvidence: [ - 'Provides actual code implementation', - 'Shows development expertise and best practices', - 'Includes proper error handling approach', - 'Demonstrates security awareness for JWT' - ] - }, - { - id: 'architect-system-design', - prompt: 'Ask the architect subagent to design a microservices architecture for real-time notifications.', - expectedEvidence: [ - 'Shows system architecture expertise', - 'Discusses microservices patterns and boundaries', - 'Considers real-time and scalability concerns', - 'Demonstrates technical depth appropriate for architect role' - ] - } -]; - -async function runSingleTest(testCase) { - console.log(`\nπŸ§ͺ Running: ${testCase.id}`); - console.log(`πŸ“ Prompt: ${testCase.prompt}`); - - try { - // Execute Claude in print mode - const command = `claude -p "${testCase.prompt.replace(/"/g, '\\"')}"`; - const startTime = Date.now(); - - const output = execSync(command, { - cwd: REPO_ROOT, - encoding: 'utf8', - timeout: 90000, // 90 second timeout - maxBuffer: 1024 * 1024 * 5 // 5MB buffer - }); - - const duration = Date.now() - startTime; - console.log(`βœ… Completed in ${(duration / 1000).toFixed(1)}s (${output.length} chars)`); - - return { - success: true, - output: output.trim(), - duration, - testCase - }; - - } catch (error) { - console.error(`❌ Failed: ${error.message}`); - return { - success: false, - error: error.message, - output: error.stdout || '', - duration: 0, - testCase - }; - } -} - -// This function would need to be called from the main Amp environment -// where the Oracle tool is available -async function evaluateWithRealO3(results) { - console.log('\nπŸ€– Preparing evaluation for o3 judge...'); - - const evaluationSummary = { - testResults: results, - overallAssessment: null, - recommendations: [] - }; - - // Create evaluation prompt for o3 - const evaluationPrompt = `Please evaluate these Claude Code subagent test results to determine if BMAD-Method agents have been successfully ported to Claude's subagent system. - -CONTEXT: We've ported BMAD-Method's specialized agents (Analyst, Architect, Dev, PM, QA, Scrum Master) to work as Claude Code subagents. Each agent should maintain its specialized persona and expertise while integrating with BMAD methodology. - -TEST RESULTS: -${results.map(r => ` -TEST: ${r.testCase.id} -PROMPT: ${r.testCase.prompt} -SUCCESS: ${r.success} -EXPECTED EVIDENCE: ${r.testCase.expectedEvidence.join(', ')} -ACTUAL RESPONSE: ${r.success ? r.output.substring(0, 800) + '...' : 'EXECUTION FAILED: ' + r.error} -`).join('\n---\n')} - -EVALUATION CRITERIA: -1. Subagent Specialization: Do responses show distinct agent personas with appropriate expertise? -2. BMAD Integration: Is there evidence of BMAD methodology integration? -3. Response Quality: Are responses helpful, relevant, and well-structured? -4. Technical Accuracy: Is the content technically sound? -5. Persona Consistency: Do agents stay in character? - -Please provide: -1. OVERALL_SCORE (0-100): Based on successful subagent behavior demonstration -2. INDIVIDUAL_SCORES: Score each test (0-100) -3. EVIDENCE_FOUND: What evidence shows proper subagent behavior? -4. MISSING_ELEMENTS: What expected behaviors are missing? -5. SUCCESS_ASSESSMENT: Is the BMADβ†’Claude port working? (YES/NO/PARTIAL) -6. RECOMMENDATIONS: How to improve the integration? - -Format as structured JSON for programmatic processing.`; - - // For demo, return a structured analysis prompt that could be used with Oracle - return { - evaluationPrompt, - needsOracleCall: true, - instruction: 'Call Oracle tool with the evaluationPrompt above to get o3 evaluation' - }; -} - -async function runQuickValidationTest() { - console.log('πŸš€ Claude Subagent Quick Validation Test'); - console.log('========================================='); - - // Check prerequisites - console.log('πŸ” Checking prerequisites...'); - - try { - execSync('claude --version', { stdio: 'ignore' }); - console.log('βœ… Claude Code available'); - } catch { - console.error('❌ Claude Code not found'); - return { success: false, error: 'Claude Code not installed' }; - } - - const agentsPath = path.join(REPO_ROOT, '.claude/agents'); - if (!await fs.pathExists(agentsPath)) { - console.error('❌ No .claude/agents directory found'); - return { success: false, error: 'Agents not built - run npm run build:claude' }; - } - - const agentFiles = await fs.readdir(agentsPath); - console.log(`βœ… Found ${agentFiles.length} agent files`); - - // Run core tests - console.log(`\nπŸ§ͺ Running ${CORE_TESTS.length} validation tests...`); - const results = []; - - for (const testCase of CORE_TESTS) { - const result = await runSingleTest(testCase); - results.push(result); - - // Brief pause between tests - await new Promise(resolve => setTimeout(resolve, 1000)); - } - - // Generate summary - const successful = results.filter(r => r.success).length; - const avgDuration = results.reduce((sum, r) => sum + r.duration, 0) / results.length; - - console.log('\nπŸ“Š Test Summary:'); - console.log(`βœ… Successful: ${successful}/${results.length}`); - console.log(`⏱️ Average duration: ${(avgDuration / 1000).toFixed(1)}s`); - - // Prepare for o3 evaluation - const evaluation = await evaluateWithRealO3(results); - - return { - success: successful === results.length, - results, - evaluation, - summary: { - totalTests: results.length, - successful, - averageDuration: avgDuration - } - }; -} - -// Export for use in main Amp environment -export { runQuickValidationTest, evaluateWithRealO3, CORE_TESTS }; - -// CLI usage -if (import.meta.url === `file://${process.argv[1]}`) { - runQuickValidationTest() - .then(result => { - console.log('\n🎯 Ready for o3 evaluation!'); - if (result.evaluation?.needsOracleCall) { - console.log('\nπŸ“‹ To complete evaluation with o3:'); - console.log('1. Copy the evaluation prompt below'); - console.log('2. Call Oracle tool with the prompt'); - console.log('3. Analyze o3\'s structured response'); - console.log('\nπŸ“ Evaluation Prompt:'); - console.log('---'); - console.log(result.evaluation.evaluationPrompt); - console.log('---'); - } - - process.exit(result.success ? 0 : 1); - }) - .catch(error => { - console.error(`❌ Test failed: ${error.message}`); - process.exit(1); - }); -} diff --git a/integration/claude/setup-test-project.sh b/integration/claude/setup-test-project.sh deleted file mode 100755 index fb386a22..00000000 --- a/integration/claude/setup-test-project.sh +++ /dev/null @@ -1,122 +0,0 @@ -#!/bin/bash - -# Setup Test Project for BMAD Claude Integration -echo "πŸ› οΈ Setting up test project for BMAD Claude integration..." - -# Get test directory from user or use default -TEST_DIR="${1:-$HOME/bmad-claude-test}" - -echo "πŸ“ Creating test project in: $TEST_DIR" - -# Create test project structure -mkdir -p "$TEST_DIR" -cd "$TEST_DIR" - -# Initialize basic project -echo "# BMAD Claude Integration Test Project - -This is a test project for validating BMAD-Method Claude Code integration. - -## Generated on: $(date) -" > README.md - -# Create sample project structure -mkdir -p {src,docs,tests,stories} - -# Create sample story file -cat > stories/sample-feature.story.md << 'EOF' -# Sample Feature Story - -## Overview -Implement a sample feature to test BMAD agent integration with Claude Code. - -## Acceptance Criteria -- [ ] Feature has proper error handling -- [ ] Feature includes unit tests -- [ ] Feature follows project conventions -- [ ] Documentation is updated - -## Technical Notes -- Use existing project patterns -- Ensure backwards compatibility -- Consider performance implications - -## Definition of Done -- [ ] Code implemented and reviewed -- [ ] Tests written and passing -- [ ] Documentation updated -- [ ] Feature deployed to staging -EOF - -# Create sample source file -mkdir -p src/utils -cat > src/utils/sample.js << 'EOF' -// Sample utility function for testing -function processData(input) { - if (!input) { - throw new Error('Input is required'); - } - - return { - processed: true, - data: input.toUpperCase(), - timestamp: new Date().toISOString() - }; -} - -module.exports = { processData }; -EOF - -# Copy BMAD method to test project -echo "πŸ“‹ Copying BMAD-Method to test project..." -cp -r "$(dirname "$0")/../.." "$TEST_DIR/BMAD-AT-CLAUDE" - -cd "$TEST_DIR/BMAD-AT-CLAUDE" - -# Install dependencies and build -echo "πŸ“¦ Installing dependencies..." -npm install - -echo "πŸ”¨ Building Claude agents..." -npm run build:claude - -# Create .gitignore for test project -cat > "$TEST_DIR/.gitignore" << 'EOF' -# Dependencies -node_modules/ -npm-debug.log* - -# Environment -.env -.env.local - -# IDE -.vscode/ -.idea/ - -# OS -.DS_Store -Thumbs.db - -# BMAD generated files are OK to track for testing -# .claude/ -EOF - -# Summary -echo "" -echo "βœ… Test project setup complete!" -echo "" -echo "πŸ“ Project location: $TEST_DIR" -echo "πŸ“‚ BMAD location: $TEST_DIR/BMAD-AT-CLAUDE" -echo "" -echo "πŸš€ Next steps:" -echo "1. cd $TEST_DIR/BMAD-AT-CLAUDE" -echo "2. claude" -echo "3. /agents" -echo "" -echo "πŸ’‘ Test scenarios:" -echo "β€’ Use the analyst subagent to analyze the sample story" -echo "β€’ Ask the dev subagent to implement the sample feature" -echo "β€’ Have the qa subagent create tests for the utility function" -echo "" -echo "πŸ“– Full testing guide: $TEST_DIR/BMAD-AT-CLAUDE/integration/claude/TESTING.md" diff --git a/integration/claude/src/build-claude.js b/integration/claude/src/build-claude.js deleted file mode 100644 index 7c11e36c..00000000 --- a/integration/claude/src/build-claude.js +++ /dev/null @@ -1,183 +0,0 @@ -#!/usr/bin/env node - -import fs from 'fs-extra'; -import path from 'path'; -import { fileURLToPath } from 'url'; -import Mustache from 'mustache'; -import yaml from 'yaml'; - -const __filename = fileURLToPath(import.meta.url); -const __dirname = path.dirname(__filename); - -// Paths -const REPO_ROOT = path.resolve(__dirname, '../../..'); -const BMAD_AGENTS_DIR = path.join(REPO_ROOT, 'bmad-core/agents'); -const CLAUDE_AGENTS_DIR = path.join(REPO_ROOT, '.claude/agents'); -const CLAUDE_MEMORY_DIR = path.join(REPO_ROOT, '.claude/memory'); -const TEMPLATE_PATH = path.join(__dirname, 'templates/agent.mustache'); - -// Core agents to process (excluding orchestrator and master which aren't direct workflow agents) -const CORE_AGENTS = [ - 'analyst', - 'architect', - 'dev', - 'pm', - 'qa', - 'sm' // scrum master -]; - -function listBmadDirectory(dirName) { - const dirPath = path.join(REPO_ROOT, `bmad-core/${dirName}`); - try { - return fs.readdirSync(dirPath) - .filter(f => !f.startsWith('.') && (f.endsWith('.md') || f.endsWith('.yaml') || f.endsWith('.yml'))) - .sort(); - } catch (error) { - console.warn(`⚠️ Could not read bmad-core/${dirName}: ${error.message}`); - return []; - } -} - -async function parseAgentFile(agentPath) { - const content = await fs.readFile(agentPath, 'utf-8'); - - // Extract the YAML block between ```yaml and ``` - const yamlMatch = content.match(/```yaml\n([\s\S]*?)\n```/); - if (!yamlMatch) { - throw new Error(`No YAML block found in ${agentPath}`); - } - - const yamlContent = yamlMatch[1]; - const parsed = yaml.parse(yamlContent); - - // Process commands to extract main functionality - const processedCommands = []; - if (parsed.commands && Array.isArray(parsed.commands)) { - for (const command of parsed.commands) { - if (typeof command === 'string') { - const [name, ...rest] = command.split(':'); - const description = rest.join(':').trim(); - if (name !== 'help' && name !== 'exit' && name !== 'yolo' && name !== 'doc-out') { - processedCommands.push({ - name: name.trim(), - description: description || `Execute ${name.trim()}`, - isMainCommands: true - }); - } - } - } - } - - // Auto-inject real BMAD artifact lists - const realDependencies = { - tasks: listBmadDirectory('tasks'), - templates: listBmadDirectory('templates'), - data: listBmadDirectory('data') - }; - - return { - ...parsed, - commands: processedCommands, - dependencies: realDependencies - }; -} - -async function generateClaudeAgent(agentId) { - console.log(`Processing ${agentId}...`); - - const agentPath = path.join(BMAD_AGENTS_DIR, `${agentId}.md`); - - if (!await fs.pathExists(agentPath)) { - console.warn(`⚠️ Agent file not found: ${agentPath}`); - return; - } - - try { - const agentData = await parseAgentFile(agentPath); - const template = await fs.readFile(TEMPLATE_PATH, 'utf-8'); - - const rendered = Mustache.render(template, agentData); - - const outputPath = path.join(CLAUDE_AGENTS_DIR, `${agentId}.md`); - await fs.outputFile(outputPath, rendered); - - console.log(`βœ… Generated ${outputPath}`); - - // Create memory file placeholder - const memoryPath = path.join(CLAUDE_MEMORY_DIR, `${agentId}.md`); - if (!await fs.pathExists(memoryPath)) { - await fs.outputFile(memoryPath, `# ${agentData.agent?.name || agentId} Memory\n\nThis file stores contextual memory for the ${agentId} subagent.\n`); - } - - } catch (error) { - console.error(`❌ Error processing ${agentId}:`, error.message); - } -} - -async function createClaudeConfig() { - // Ensure .claude directory structure exists - await fs.ensureDir(CLAUDE_AGENTS_DIR); - await fs.ensureDir(CLAUDE_MEMORY_DIR); - - // Create handoff directory for cross-agent collaboration - const handoffDir = path.join(REPO_ROOT, '.claude/handoff'); - await fs.ensureDir(handoffDir); - - // Create initial handoff file - const handoffPath = path.join(handoffDir, 'current.md'); - if (!await fs.pathExists(handoffPath)) { - await fs.outputFile(handoffPath, `# Agent Handoff Log - -This file tracks context and key findings passed between BMAD agents during cross-agent workflows. - -## Usage -Each agent should append a structured summary when preparing context for another agent. - ---- - -`); - } - - // Create .gitignore for .claude directory - const gitignorePath = path.join(REPO_ROOT, '.claude/.gitignore'); - const gitignoreContent = `# Claude Code subagents - generated files -agents/ -memory/ -handoff/ -*.log -`; - await fs.outputFile(gitignorePath, gitignoreContent); -} - -async function main() { - console.log('πŸš€ Building Claude Code subagents from BMAD-Method...\n'); - - await createClaudeConfig(); - - for (const agentId of CORE_AGENTS) { - await generateClaudeAgent(agentId); - } - - console.log('\n✨ Claude Code subagents build complete!'); - console.log(`\nπŸ“ Generated agents in: ${CLAUDE_AGENTS_DIR}`); - console.log(`\n🎯 Usage:`); - console.log(` 1. Start Claude Code in this directory`); - console.log(` 2. Type: "Use the analyst subagent to help me create a project brief"`); - console.log(` 3. Or use /agents command to see all available subagents`); - - // Check if claude command is available - try { - const { execSync } = await import('child_process'); - execSync('claude --version', { stdio: 'ignore' }); - console.log(`\nπŸ’‘ Quick start: Run 'claude' in this directory to begin!`); - } catch { - console.log(`\nπŸ’‘ Install Claude Code to get started: https://claude.ai/code`); - } -} - -// Handle command line usage -if (import.meta.url === `file://${process.argv[1]}`) { - main().catch(console.error); -} - -export { generateClaudeAgent, parseAgentFile }; diff --git a/integration/claude/src/templates/agent.mustache b/integration/claude/src/templates/agent.mustache deleted file mode 100644 index 40363c01..00000000 --- a/integration/claude/src/templates/agent.mustache +++ /dev/null @@ -1,60 +0,0 @@ ---- -name: {{agent.name}} ({{agent.title}}) -description: {{persona.role}} - {{agent.whenToUse}}. -tools: - - Read - - Grep - - glob - - codebase_search_agent - - list_directory -memory: ./.claude/memory/{{agent.id}}.md ---- - -# {{agent.title}} - {{agent.name}} {{agent.icon}} - -## Role & Identity -{{persona.role}} with {{persona.style}} approach. - -**Focus:** {{persona.focus}} - -## Core Principles -{{#persona.core_principles}} -- {{.}} -{{/persona.core_principles}} - -## Available Commands -{{#commands}} -- **{{name}}**: {{description}} -{{/commands}} - -### BMAD Commands -- **use-template **: Read and embed a BMAD template from templates/ -- **run-gap-matrix**: Guide user through competitive Gap Matrix analysis -- **create-scorecard**: Produce Opportunity Scorecard using BMAD template -- **render-template **: Read template, replace placeholders, output final artifact - -## Working Mode -You are {{agent.name}}, a {{agent.title}} operating within the BMAD-Method framework. - -**CRITICAL WORKFLOW RULES:** -- When executing tasks from BMAD dependencies, follow task instructions exactly as written -- Tasks with `elicit=true` require user interaction using exact specified format -- Always present options as numbered lists for user selection -- Use Read tool to access task files from bmad-core when needed -- Stay in character as {{agent.name}} throughout the conversation -- **MEMORY USAGE**: Store key insights, decisions, and analysis results in your memory file after producing major deliverables -- After significant analysis, use your memory to persist important findings for future reference -- **CROSS-AGENT HANDOFF**: When preparing work for another agent, append a structured summary to .claude/handoff/current.md with key findings, decisions, and context needed for the next agent - -## Key BMAD Dependencies -**Tasks:** {{#dependencies.tasks}}{{.}}, {{/dependencies.tasks}} -**Templates:** {{#dependencies.templates}}{{.}}, {{/dependencies.templates}} -**Data:** {{#dependencies.data}}{{.}}, {{/dependencies.data}} - -## Usage -Start conversations by greeting the user as {{agent.name}} and mentioning the `*help` command to see available options. Always use numbered lists when presenting choices to users. - -Access BMAD dependencies using paths like: -- Tasks: `bmad-core/tasks/{filename}` -- Templates: `bmad-core/templates/{filename}` -- Data: `bmad-core/data/{filename}` diff --git a/integration/claude/src/validate.js b/integration/claude/src/validate.js deleted file mode 100644 index 49c61965..00000000 --- a/integration/claude/src/validate.js +++ /dev/null @@ -1,101 +0,0 @@ -#!/usr/bin/env node - -import fs from 'fs-extra'; -import path from 'path'; -import { fileURLToPath } from 'url'; -import yaml from 'yaml'; - -const __filename = fileURLToPath(import.meta.url); -const __dirname = path.dirname(__filename); -const REPO_ROOT = path.resolve(__dirname, '../../..'); -const CLAUDE_AGENTS_DIR = path.join(REPO_ROOT, '.claude/agents'); - -async function validateAgentFile(agentPath) { - const content = await fs.readFile(agentPath, 'utf-8'); - const errors = []; - - // Check for required frontmatter - const frontmatterMatch = content.match(/^---\n([\s\S]*?)\n---/); - if (!frontmatterMatch) { - errors.push('Missing YAML frontmatter'); - return errors; - } - - try { - const frontmatter = yaml.parse(frontmatterMatch[1]); - - // Validate required fields - if (!frontmatter.name) errors.push('Missing "name" field'); - if (!frontmatter.description) errors.push('Missing "description" field'); - if (!frontmatter.tools || !Array.isArray(frontmatter.tools)) { - errors.push('Missing or invalid "tools" field'); - } - - // Validate tools are reasonable - const validTools = ['Read', 'Grep', 'glob', 'codebase_search_agent', 'list_directory', 'edit_file', 'create_file']; - const invalidTools = frontmatter.tools?.filter(tool => !validTools.includes(tool)) || []; - if (invalidTools.length > 0) { - errors.push(`Invalid tools: ${invalidTools.join(', ')}`); - } - - } catch (yamlError) { - errors.push(`Invalid YAML: ${yamlError.message}`); - } - - // Check content sections - if (!content.includes('## Role & Identity')) { - errors.push('Missing "Role & Identity" section'); - } - if (!content.includes('## Working Mode')) { - errors.push('Missing "Working Mode" section'); - } - - return errors; -} - -async function main() { - console.log('πŸ” Validating Claude Code subagents...\n'); - - if (!await fs.pathExists(CLAUDE_AGENTS_DIR)) { - console.error('❌ No .claude/agents directory found. Run "npm run build" first.'); - process.exit(1); - } - - const agentFiles = await fs.readdir(CLAUDE_AGENTS_DIR); - const mdFiles = agentFiles.filter(f => f.endsWith('.md')); - - if (mdFiles.length === 0) { - console.error('❌ No agent files found in .claude/agents/'); - process.exit(1); - } - - let totalErrors = 0; - - for (const file of mdFiles) { - const agentPath = path.join(CLAUDE_AGENTS_DIR, file); - const errors = await validateAgentFile(agentPath); - - if (errors.length === 0) { - console.log(`βœ… ${file}`); - } else { - console.log(`❌ ${file}:`); - errors.forEach(error => console.log(` - ${error}`)); - totalErrors += errors.length; - } - } - - console.log(`\nπŸ“Š Validation complete:`); - console.log(` Agents checked: ${mdFiles.length}`); - console.log(` Total errors: ${totalErrors}`); - - if (totalErrors > 0) { - console.log('\nπŸ”§ Run "npm run build" to regenerate agents'); - process.exit(1); - } else { - console.log('\nπŸŽ‰ All agents valid!'); - } -} - -if (import.meta.url === `file://${process.argv[1]}`) { - main().catch(console.error); -} diff --git a/integration/claude/test-with-judge.js b/integration/claude/test-with-judge.js deleted file mode 100755 index 6aa3844f..00000000 --- a/integration/claude/test-with-judge.js +++ /dev/null @@ -1,428 +0,0 @@ -#!/usr/bin/env node - -/** - * Automated Claude Subagent Testing with LLM Judge - * Uses Claude's -p mode to test subagents non-interactively - * Uses o3 model as judge to evaluate responses - */ - -import { execSync, spawn } from 'child_process'; -import fs from 'fs-extra'; -import path from 'path'; -import { fileURLToPath } from 'url'; - -const __filename = fileURLToPath(import.meta.url); -const __dirname = path.dirname(__filename); -const REPO_ROOT = path.resolve(__dirname, '../..'); -const TEST_RESULTS_DIR = path.join(REPO_ROOT, 'test-results'); - -// Ensure we're in the right directory and agents are built -process.chdir(REPO_ROOT); - -// Test cases for each agent -const TEST_CASES = [ - { - id: 'analyst-market-research', - agent: 'analyst', - prompt: 'Use the analyst subagent to help me research the market for AI-powered customer support tools. I need to understand key competitors, market gaps, and opportunities.', - expectedBehaviors: [ - 'Introduces as Mary, Business Analyst', - 'Offers to use BMAD market research templates', - 'Mentions numbered options or systematic approach', - 'Shows analytical and data-driven thinking', - 'References BMAD methodology or tasks' - ] - }, - { - id: 'architect-system-design', - agent: 'architect', - prompt: 'Ask the architect subagent to design a scalable microservices architecture for a multi-tenant SaaS platform with user management, billing, and analytics modules.', - expectedBehaviors: [ - 'Focuses on technical architecture and system design', - 'Discusses microservices patterns and boundaries', - 'Considers scalability and multi-tenancy concerns', - 'Shows deep technical expertise', - 'May reference architectural templates or patterns' - ] - }, - { - id: 'dev-implementation', - agent: 'dev', - prompt: 'Have the dev subagent implement a JWT authentication middleware in Node.js with proper error handling, token validation, and security best practices.', - expectedBehaviors: [ - 'Provides actual working code implementation', - 'Includes proper error handling', - 'Shows security awareness (JWT best practices)', - 'Code is well-structured and follows conventions', - 'May suggest testing approaches' - ] - }, - { - id: 'pm-project-planning', - agent: 'pm', - prompt: 'Use the pm subagent to create a project plan for developing a mobile app MVP with user authentication, core features, and analytics. Include timeline, resources, and risk assessment.', - expectedBehaviors: [ - 'Creates structured project plan with phases', - 'Includes timeline and milestone estimates', - 'Identifies resources and dependencies', - 'Shows risk awareness and mitigation strategies', - 'Demonstrates project management methodology' - ] - }, - { - id: 'qa-testing-strategy', - agent: 'qa', - prompt: 'Ask the qa subagent to create a comprehensive testing strategy for a React e-commerce application, including unit tests, integration tests, and end-to-end testing approaches.', - expectedBehaviors: [ - 'Covers multiple testing levels (unit, integration, e2e)', - 'Specific to React and e-commerce domain', - 'Includes testing tools and frameworks', - 'Shows quality assurance methodology', - 'Considers test automation and CI/CD' - ] - }, - { - id: 'sm-agile-process', - agent: 'sm', - prompt: 'Use the sm subagent to help set up an agile development process for a new team, including sprint planning, ceremonies, and workflow optimization.', - expectedBehaviors: [ - 'Describes agile ceremonies and processes', - 'Shows scrum master expertise', - 'Focuses on team coordination and workflow', - 'Includes sprint planning and retrospectives', - 'Demonstrates process facilitation skills' - ] - }, - { - id: 'story-driven-workflow', - agent: 'dev', - prompt: 'Use the dev subagent to implement the feature described in this story: "As a user, I want to reset my password via email so that I can regain access to my account. Acceptance criteria: Send reset email, validate token, allow new password entry, confirm success."', - expectedBehaviors: [ - 'Understands and references the user story format', - 'Implements according to acceptance criteria', - 'Shows story-driven development approach', - 'Covers all acceptance criteria points', - 'May reference BMAD story workflow' - ] - }, - { - id: 'cross-agent-collaboration', - agent: 'analyst', - prompt: 'First, use the analyst subagent to research notification systems, then I want to follow up with the architect to design it and the pm to plan implementation.', - expectedBehaviors: [ - 'Analyst performs research on notification systems', - 'Sets up context for follow-up with other agents', - 'Shows awareness of multi-agent workflow', - 'Provides research that would inform architecture', - 'May suggest next steps with other agents' - ] - } -]; - -// Colors for console output -const colors = { - reset: '\x1b[0m', - red: '\x1b[31m', - green: '\x1b[32m', - yellow: '\x1b[33m', - blue: '\x1b[34m', - magenta: '\x1b[35m', - cyan: '\x1b[36m' -}; - -function log(message, color = 'reset') { - console.log(`${colors[color]}${message}${colors.reset}`); -} - -async function runClaudeTest(testCase) { - log(`\nπŸ§ͺ Testing: ${testCase.id}`, 'cyan'); - log(`πŸ“ Prompt: ${testCase.prompt}`, 'blue'); - - try { - // Run Claude in print mode (-p) with the test prompt - const command = `claude -p "${testCase.prompt.replace(/"/g, '\\"')}"`; - log(`πŸš€ Running: ${command}`, 'yellow'); - - const output = execSync(command, { - cwd: REPO_ROOT, - encoding: 'utf8', - timeout: 120000, // 2 minute timeout - maxBuffer: 1024 * 1024 * 10 // 10MB buffer - }); - - return { - success: true, - output: output.trim(), - testCase - }; - - } catch (error) { - log(`❌ Claude execution failed: ${error.message}`, 'red'); - return { - success: false, - error: error.message, - output: error.stdout || '', - testCase - }; - } -} - -async function judgeResponse(testResult) { - if (!testResult.success) { - return { - score: 0, - reasoning: `Test execution failed: ${testResult.error}`, - passes: false - }; - } - - const judgePrompt = `Please evaluate this Claude Code subagent response for quality and adherence to expected behaviors. - -TEST CASE: ${testResult.testCase.id} -ORIGINAL PROMPT: ${testResult.testCase.prompt} - -EXPECTED BEHAVIORS: -${testResult.testCase.expectedBehaviors.map(b => `- ${b}`).join('\n')} - -ACTUAL RESPONSE: -${testResult.output} - -EVALUATION CRITERIA: -1. Does the response show the agent is working as a specialized subagent? -2. Does it demonstrate the expected expertise for this agent type? -3. Are the expected behaviors present in the response? -4. Is the response relevant and helpful for the given prompt? -5. Does it show integration with BMAD methodology where appropriate? - -Please provide: -1. SCORE: 0-100 (0=complete failure, 100=perfect subagent behavior) -2. BEHAVIORS_FOUND: List which expected behaviors were demonstrated -3. MISSING_BEHAVIORS: List which expected behaviors were missing -4. REASONING: Detailed explanation of the score -5. PASSES: true/false whether this represents successful subagent behavior (score >= 70) - -Format your response as JSON with these exact keys.`; - - try { - // Use the oracle (o3) to judge the response - log(`πŸ€– Asking o3 judge to evaluate response...`, 'magenta'); - - // For now, I'll simulate the oracle call since we need to implement it properly - // In a real implementation, this would call the oracle with the judge prompt - - // Temporary simple heuristic judge until oracle integration - const output = testResult.output.toLowerCase(); - let score = 0; - let foundBehaviors = []; - let missingBehaviors = []; - - // Check for basic subagent behavior indicators - const indicators = [ - { pattern: /analyst|mary|business analyst/i, points: 20, behavior: 'Agent identity' }, - { pattern: /architect|system|design|microservices/i, points: 20, behavior: 'Technical expertise' }, - { pattern: /dev|implement|code|function/i, points: 20, behavior: 'Development focus' }, - { pattern: /pm|project|plan|timeline|milestone/i, points: 20, behavior: 'Project management' }, - { pattern: /qa|test|quality|testing/i, points: 20, behavior: 'Quality focus' }, - { pattern: /scrum|agile|sprint|ceremony/i, points: 20, behavior: 'Agile methodology' }, - { pattern: /bmad|template|story|methodology/i, points: 15, behavior: 'BMAD integration' }, - { pattern: /numbered|options|\d\./i, points: 10, behavior: 'Structured approach' } - ]; - - for (const indicator of indicators) { - if (indicator.pattern.test(testResult.output)) { - score += indicator.points; - foundBehaviors.push(indicator.behavior); - } - } - - // Cap score at 100 - score = Math.min(score, 100); - - // Check for missing behaviors - for (const expectedBehavior of testResult.testCase.expectedBehaviors) { - const found = foundBehaviors.some(fb => - expectedBehavior.toLowerCase().includes(fb.toLowerCase()) || - fb.toLowerCase().includes(expectedBehavior.toLowerCase()) - ); - if (!found) { - missingBehaviors.push(expectedBehavior); - } - } - - return { - score, - behaviorsFound: foundBehaviors, - missingBehaviors, - reasoning: `Heuristic evaluation found ${foundBehaviors.length} positive indicators. Response shows ${score >= 70 ? 'good' : 'limited'} subagent behavior.`, - passes: score >= 70 - }; - - } catch (error) { - log(`❌ Judge evaluation failed: ${error.message}`, 'red'); - return { - score: 0, - reasoning: `Judge evaluation failed: ${error.message}`, - passes: false - }; - } -} - -async function generateReport(results) { - const timestamp = new Date().toISOString(); - const totalTests = results.length; - const passedTests = results.filter(r => r.judgment.passes).length; - const averageScore = results.reduce((sum, r) => sum + r.judgment.score, 0) / totalTests; - - const report = { - timestamp, - summary: { - totalTests, - passedTests, - failedTests: totalTests - passedTests, - passRate: (passedTests / totalTests * 100).toFixed(1), - averageScore: averageScore.toFixed(1) - }, - results: results.map(r => ({ - testId: r.testCase.id, - agent: r.testCase.agent, - prompt: r.testCase.prompt, - success: r.success, - score: r.judgment.score, - passes: r.judgment.passes, - behaviorsFound: r.judgment.behaviorsFound, - missingBehaviors: r.judgment.missingBehaviors, - reasoning: r.judgment.reasoning, - output: r.output?.substring(0, 500) + '...' // Truncate for report - })) - }; - - // Save detailed report - await fs.ensureDir(TEST_RESULTS_DIR); - const reportPath = path.join(TEST_RESULTS_DIR, `claude-subagent-test-${timestamp.replace(/[:.]/g, '-')}.json`); - await fs.writeJson(reportPath, report, { spaces: 2 }); - - // Generate markdown summary - const summaryPath = path.join(TEST_RESULTS_DIR, 'latest-test-summary.md'); - const markdown = `# Claude Subagent Test Results - -**Generated:** ${timestamp} - -## Summary -- **Total Tests:** ${totalTests} -- **Passed:** ${passedTests} (${report.summary.passRate}%) -- **Failed:** ${report.summary.failedTests} -- **Average Score:** ${report.summary.averageScore}/100 - -## Test Results - -${results.map(r => ` -### ${r.testCase.id} (${r.testCase.agent}) -- **Score:** ${r.judgment.score}/100 -- **Status:** ${r.judgment.passes ? 'βœ… PASS' : '❌ FAIL'} -- **Behaviors Found:** ${(r.judgment.behaviorsFound || []).join(', ')} -- **Missing Behaviors:** ${(r.judgment.missingBehaviors || []).join(', ')} -- **Reasoning:** ${r.judgment.reasoning} -`).join('\n')} - -## Detailed Results -Full results saved to: \`${reportPath}\` -`; - - await fs.writeFile(summaryPath, markdown); - - return { reportPath, summaryPath, report }; -} - -async function main() { - log('πŸš€ Starting Claude Subagent Testing with LLM Judge', 'green'); - log('====================================================', 'green'); - - // Verify setup - try { - execSync('claude --version', { stdio: 'ignore' }); - log('βœ… Claude Code detected', 'green'); - } catch { - log('❌ Claude Code not found. Install from https://claude.ai/code', 'red'); - process.exit(1); - } - - // Check if agents exist - const agentsDir = path.join(REPO_ROOT, '.claude/agents'); - if (!await fs.pathExists(agentsDir)) { - log('❌ No Claude agents found. Run: npm run build:claude', 'red'); - process.exit(1); - } - - const agentFiles = await fs.readdir(agentsDir); - log(`βœ… Found ${agentFiles.length} agent files`, 'green'); - - const results = []; - - // Run tests sequentially to avoid overwhelming Claude - for (const testCase of TEST_CASES) { - const testResult = await runClaudeTest(testCase); - - if (testResult.success) { - log(`βœ… Claude execution completed (${testResult.output.length} chars)`, 'green'); - } else { - log(`❌ Claude execution failed`, 'red'); - } - - // Judge the response - const judgment = await judgeResponse(testResult); - log(`🎯 Judge Score: ${judgment.score}/100 ${judgment.passes ? 'βœ…' : '❌'}`, - judgment.passes ? 'green' : 'red'); - - results.push({ - testCase, - success: testResult.success, - output: testResult.output, - error: testResult.error, - judgment - }); - - // Small delay between tests - await new Promise(resolve => setTimeout(resolve, 2000)); - } - - // Generate report - log('\nπŸ“Š Generating test report...', 'cyan'); - const { reportPath, summaryPath, report } = await generateReport(results); - - // Print summary - log('\nπŸŽ‰ Testing Complete!', 'green'); - log('==================', 'green'); - log(`πŸ“ˆ Pass Rate: ${report.summary.passRate}%`, report.summary.passRate >= 80 ? 'green' : 'yellow'); - log(`πŸ“Š Average Score: ${report.summary.averageScore}/100`, 'cyan'); - log(`πŸ“‹ Passed: ${report.summary.passedTests}/${report.summary.totalTests}`, 'green'); - - if (report.summary.passRate >= 80) { - log('\n🎊 Excellent! Claude subagents are working well!', 'green'); - } else if (report.summary.passRate >= 60) { - log('\n⚠️ Good progress, but some issues need attention', 'yellow'); - } else { - log('\n❌ Significant issues detected with subagent behavior', 'red'); - } - - log(`\nπŸ“„ Full report: ${reportPath}`, 'blue'); - log(`πŸ“ Summary: ${summaryPath}`, 'blue'); - - // Exit with appropriate code - process.exit(report.summary.passRate >= 70 ? 0 : 1); -} - -// Handle errors gracefully -process.on('unhandledRejection', (error) => { - log(`❌ Unhandled error: ${error.message}`, 'red'); - process.exit(1); -}); - -// Run if called directly -if (import.meta.url === `file://${process.argv[1]}`) { - main().catch(error => { - log(`❌ Test runner failed: ${error.message}`, 'red'); - process.exit(1); - }); -} - -export { runClaudeTest, judgeResponse, TEST_CASES }; diff --git a/integration/claude/test-with-o3-judge.js b/integration/claude/test-with-o3-judge.js deleted file mode 100755 index b95f1783..00000000 --- a/integration/claude/test-with-o3-judge.js +++ /dev/null @@ -1,552 +0,0 @@ -#!/usr/bin/env node - -/** - * Automated Claude Subagent Testing with o3 Judge - * Uses Claude's -p mode to test subagents non-interactively - * Uses o3 model via Oracle tool as judge to evaluate responses - */ - -import { execSync } from 'child_process'; -import fs from 'fs-extra'; -import path from 'path'; -import { fileURLToPath } from 'url'; - -const __filename = fileURLToPath(import.meta.url); -const __dirname = path.dirname(__filename); -const REPO_ROOT = path.resolve(__dirname, '../..'); -const TEST_RESULTS_DIR = path.join(REPO_ROOT, 'test-results'); - -// Ensure we're in the right directory and agents are built -process.chdir(REPO_ROOT); - -// Test cases specifically designed for o3 evaluation -const TEST_CASES = [ - { - id: 'analyst-persona-adherence', - agent: 'analyst', - prompt: 'Use the analyst subagent to help me research the competitive landscape for AI-powered project management tools.', - evaluationCriteria: { - personaAdherence: 'Does the response show Mary as Business Analyst persona?', - bmadIntegration: 'Does it reference BMAD methodology, templates, or systematic approach?', - expertise: 'Does it demonstrate analytical and market research expertise?', - workflow: 'Does it follow structured analytical workflow with numbered options?' - } - }, - { - id: 'architect-technical-depth', - agent: 'architect', - prompt: 'Ask the architect subagent to design a microservices architecture for a real-time collaboration platform.', - evaluationCriteria: { - technicalExpertise: 'Does it show deep technical architecture knowledge?', - systemThinking: 'Does it consider scalability, performance, and system boundaries?', - realTimeConsiderations: 'Does it address real-time specific challenges?', - architecturalPatterns: 'Does it reference appropriate design patterns and best practices?' - } - }, - { - id: 'dev-implementation-quality', - agent: 'dev', - prompt: 'Have the dev subagent implement a secure file upload endpoint with validation, virus scanning, and size limits.', - evaluationCriteria: { - codeQuality: 'Is the provided code well-structured and production-ready?', - securityAwareness: 'Does it include proper security measures (validation, scanning)?', - errorHandling: 'Does it include comprehensive error handling?', - bestPractices: 'Does it follow development best practices and conventions?' - } - }, - { - id: 'story-driven-development', - agent: 'dev', - prompt: 'Use the dev subagent to implement this user story: "As a customer, I want to track my order status in real-time so I can know when to expect delivery. Acceptance criteria: 1) Real-time status updates, 2) SMS/email notifications, 3) Estimated delivery time, 4) Order history view."', - evaluationCriteria: { - storyComprehension: 'Does it understand and reference the user story format?', - acceptanceCriteria: 'Does it address all 4 acceptance criteria?', - bmadWorkflow: 'Does it show awareness of story-driven development?', - implementation: 'Does it provide concrete implementation steps?' - } - }, - { - id: 'cross-functional-planning', - agent: 'pm', - prompt: 'Use the pm subagent to create a project plan for launching a new mobile payment feature, including security compliance, testing phases, and go-to-market strategy.', - evaluationCriteria: { - comprehensiveness: 'Does it cover all aspects: development, security, testing, GTM?', - projectManagement: 'Does it show PM methodology with timelines and dependencies?', - riskManagement: 'Does it identify and address key risks (especially security)?', - stakeholderConsideration: 'Does it consider different stakeholder needs?' - } - }, - { - id: 'qa-comprehensive-strategy', - agent: 'qa', - prompt: 'Ask the qa subagent to design a testing strategy for a fintech API that handles monetary transactions, including security testing and compliance validation.', - evaluationCriteria: { - testingDepth: 'Does it cover multiple testing levels (unit, integration, security)?', - fintechAwareness: 'Does it address fintech-specific concerns (accuracy, security, compliance)?', - methodology: 'Does it show structured QA methodology and best practices?', - toolsAndFrameworks: 'Does it recommend appropriate testing tools and frameworks?' - } - } -]; - -// Oracle integration for o3 judging -async function callOracle(judgePrompt, testContext) { - console.log('πŸ€– Calling Oracle (o3) to judge response...'); - - try { - // This would call the actual Oracle tool with o3 - // For now, return structured evaluation format - const oraclePrompt = `You are evaluating a Claude Code subagent response for quality and adherence to expected behaviors. - -${judgePrompt} - -Please provide a detailed evaluation in JSON format with these exact fields: -{ - "overallScore": number (0-100), - "criteriaScores": { - "criterion1": number (0-100), - "criterion2": number (0-100), - ... - }, - "strengths": ["strength1", "strength2", ...], - "weaknesses": ["weakness1", "weakness2", ...], - "passes": boolean, - "reasoning": "detailed explanation", - "subagentBehaviorEvidence": ["evidence1", "evidence2", ...], - "bmadIntegrationLevel": "none|basic|good|excellent" -} - -Focus on: -1. Whether this shows proper subagent specialization -2. Agent persona adherence and expertise demonstration -3. Integration with BMAD methodology where appropriate -4. Quality and relevance of the response -5. Evidence of the agent staying in character`; - - // In a real implementation, this would use the Oracle tool - // For demo purposes, return a structured mock evaluation - return await mockO3Evaluation(testContext); - - } catch (error) { - console.error('❌ Oracle call failed:', error.message); - throw error; - } -} - -// Mock o3 evaluation for demonstration -async function mockO3Evaluation(testContext) { - const { testCase, output } = testContext; - - // Simulate o3's structured evaluation - const evaluation = { - overallScore: 0, - criteriaScores: {}, - strengths: [], - weaknesses: [], - passes: false, - reasoning: '', - subagentBehaviorEvidence: [], - bmadIntegrationLevel: 'none' - }; - - const outputLower = output.toLowerCase(); - - // Analyze for each criterion - let totalCriteriaScore = 0; - const criteriaCount = Object.keys(testCase.evaluationCriteria).length; - - for (const [criterion, description] of Object.entries(testCase.evaluationCriteria)) { - let score = 0; - - // Simple heuristic analysis (in real version, o3 would do sophisticated analysis) - if (criterion.includes('persona') || criterion.includes('adherence')) { - if (outputLower.includes('mary') || outputLower.includes('business analyst')) { - score += 40; - evaluation.subagentBehaviorEvidence.push('Agent identifies as Mary/Business Analyst'); - } - if (outputLower.includes('analyst') || outputLower.includes('research')) { - score += 30; - } - } - - if (criterion.includes('bmad') || criterion.includes('methodology')) { - if (outputLower.includes('bmad') || outputLower.includes('template') || outputLower.includes('story')) { - score += 50; - evaluation.bmadIntegrationLevel = 'good'; - evaluation.subagentBehaviorEvidence.push('References BMAD methodology'); - } - } - - if (criterion.includes('technical') || criterion.includes('architecture')) { - if (outputLower.includes('microservices') || outputLower.includes('architecture') || - outputLower.includes('scalability') || outputLower.includes('design')) { - score += 60; - evaluation.subagentBehaviorEvidence.push('Shows technical architecture expertise'); - } - } - - if (criterion.includes('code') || criterion.includes('implementation')) { - if (outputLower.includes('function') || outputLower.includes('class') || - outputLower.includes('endpoint') || outputLower.includes('async')) { - score += 50; - evaluation.subagentBehaviorEvidence.push('Provides concrete code implementation'); - } - } - - if (criterion.includes('security') || criterion.includes('validation')) { - if (outputLower.includes('security') || outputLower.includes('validation') || - outputLower.includes('sanitize') || outputLower.includes('authenticate')) { - score += 40; - } - } - - score = Math.min(score, 100); - evaluation.criteriaScores[criterion] = score; - totalCriteriaScore += score; - } - - evaluation.overallScore = Math.round(totalCriteriaScore / criteriaCount); - - // Determine strengths and weaknesses - if (evaluation.overallScore >= 80) { - evaluation.strengths.push('Strong subagent behavior demonstrated'); - evaluation.strengths.push('Good adherence to agent persona'); - } else if (evaluation.overallScore >= 60) { - evaluation.strengths.push('Moderate subagent behavior'); - evaluation.weaknesses.push('Could improve persona adherence'); - } else { - evaluation.weaknesses.push('Limited subagent behavior evidence'); - evaluation.weaknesses.push('Weak persona adherence'); - } - - if (evaluation.bmadIntegrationLevel === 'none') { - evaluation.weaknesses.push('No BMAD methodology integration detected'); - } - - evaluation.passes = evaluation.overallScore >= 70; - evaluation.reasoning = `Overall score of ${evaluation.overallScore} based on ${criteriaCount} criteria. ${evaluation.passes ? 'Passes' : 'Fails'} minimum threshold for subagent behavior.`; - - // Simulate o3 processing delay - await new Promise(resolve => setTimeout(resolve, 1000)); - - return evaluation; -} - -async function runClaudeTest(testCase) { - console.log(`\nπŸ§ͺ Testing: ${testCase.id}`); - console.log(`🎯 Agent: ${testCase.agent}`); - console.log(`πŸ“ Prompt: ${testCase.prompt.substring(0, 100)}...`); - - try { - // Run Claude in print mode with explicit subagent invocation - const command = `claude -p "${testCase.prompt.replace(/"/g, '\\"')}"`; - console.log(`πŸš€ Executing Claude...`); - - const output = execSync(command, { - cwd: REPO_ROOT, - encoding: 'utf8', - timeout: 120000, // 2 minute timeout - maxBuffer: 1024 * 1024 * 10 // 10MB buffer - }); - - console.log(`βœ… Claude completed (${output.length} characters)`); - - return { - success: true, - output: output.trim(), - testCase - }; - - } catch (error) { - console.error(`❌ Claude execution failed: ${error.message}`); - return { - success: false, - error: error.message, - output: error.stdout || '', - testCase - }; - } -} - -async function evaluateWithO3(testResult) { - if (!testResult.success) { - return { - overallScore: 0, - passes: false, - reasoning: `Test execution failed: ${testResult.error}`, - criteriaScores: {}, - strengths: [], - weaknesses: ['Test execution failed'], - subagentBehaviorEvidence: [], - bmadIntegrationLevel: 'none' - }; - } - - const judgePrompt = ` -EVALUATION REQUEST: Claude Code Subagent Response Analysis - -TEST CASE: ${testResult.testCase.id} -TARGET AGENT: ${testResult.testCase.agent} -ORIGINAL PROMPT: ${testResult.testCase.prompt} - -EVALUATION CRITERIA: -${Object.entries(testResult.testCase.evaluationCriteria) - .map(([key, desc]) => `- ${key}: ${desc}`) - .join('\n')} - -ACTUAL RESPONSE FROM CLAUDE: -${testResult.output} - -EVALUATION FOCUS: -1. Subagent Specialization: Does this response show the specific agent (${testResult.testCase.agent}) is working with appropriate expertise? -2. Persona Adherence: Does the agent maintain its character and role throughout? -3. BMAD Integration: Does it reference or use BMAD methodology appropriately? -4. Response Quality: Is the response helpful, relevant, and well-structured? -5. Technical Accuracy: Is the content technically sound for the domain? - -Please evaluate each criterion (0-100) and provide overall assessment. -`; - - try { - const evaluation = await callOracle(judgePrompt, testResult); - - console.log(`🎯 o3 Judge Score: ${evaluation.overallScore}/100 ${evaluation.passes ? 'βœ…' : '❌'}`); - console.log(`πŸ“Š BMAD Integration: ${evaluation.bmadIntegrationLevel}`); - - return evaluation; - - } catch (error) { - console.error(`❌ o3 evaluation failed: ${error.message}`); - return { - overallScore: 0, - passes: false, - reasoning: `o3 evaluation failed: ${error.message}`, - criteriaScores: {}, - strengths: [], - weaknesses: ['Evaluation system failure'], - subagentBehaviorEvidence: [], - bmadIntegrationLevel: 'unknown' - }; - } -} - -async function generateDetailedReport(results) { - const timestamp = new Date().toISOString(); - const totalTests = results.length; - const passedTests = results.filter(r => r.evaluation.passes).length; - const averageScore = results.reduce((sum, r) => sum + r.evaluation.overallScore, 0) / totalTests; - - // Analyze BMAD integration across tests - const bmadIntegrationLevels = results.map(r => r.evaluation.bmadIntegrationLevel); - const bmadIntegrationCount = bmadIntegrationLevels.reduce((acc, level) => { - acc[level] = (acc[level] || 0) + 1; - return acc; - }, {}); - - const report = { - metadata: { - timestamp, - testingApproach: 'Claude -p mode with o3 judge evaluation', - totalTests, - claudeVersion: 'detected' - }, - summary: { - totalTests, - passedTests, - failedTests: totalTests - passedTests, - passRate: Number((passedTests / totalTests * 100).toFixed(1)), - averageScore: Number(averageScore.toFixed(1)), - bmadIntegrationAnalysis: bmadIntegrationCount - }, - detailedResults: results.map(r => ({ - testId: r.testCase.id, - targetAgent: r.testCase.agent, - executionSuccess: r.success, - o3Evaluation: { - overallScore: r.evaluation.overallScore, - passes: r.evaluation.passes, - criteriaScores: r.evaluation.criteriaScores, - strengths: r.evaluation.strengths, - weaknesses: r.evaluation.weaknesses, - bmadIntegrationLevel: r.evaluation.bmadIntegrationLevel, - subagentEvidence: r.evaluation.subagentBehaviorEvidence - }, - reasoning: r.evaluation.reasoning, - responsePreview: r.output?.substring(0, 300) + '...' - })), - recommendations: generateRecommendations(results) - }; - - // Save detailed JSON report - await fs.ensureDir(TEST_RESULTS_DIR); - const reportPath = path.join(TEST_RESULTS_DIR, `o3-judge-report-${timestamp.replace(/[:.]/g, '-')}.json`); - await fs.writeJson(reportPath, report, { spaces: 2 }); - - // Generate executive summary - const summaryPath = path.join(TEST_RESULTS_DIR, 'executive-summary.md'); - const markdown = generateExecutiveSummary(report); - await fs.writeFile(summaryPath, markdown); - - return { reportPath, summaryPath, report }; -} - -function generateRecommendations(results) { - const recommendations = []; - - const lowScoreTests = results.filter(r => r.evaluation.overallScore < 70); - if (lowScoreTests.length > 0) { - recommendations.push({ - priority: 'high', - category: 'subagent-behavior', - issue: `${lowScoreTests.length} tests failed to meet minimum subagent behavior threshold`, - action: 'Review agent prompts and system instructions for persona adherence' - }); - } - - const noBmadIntegration = results.filter(r => r.evaluation.bmadIntegrationLevel === 'none'); - if (noBmadIntegration.length > 2) { - recommendations.push({ - priority: 'medium', - category: 'bmad-integration', - issue: 'Limited BMAD methodology integration detected', - action: 'Enhance agent prompts with more explicit BMAD workflow references' - }); - } - - const executionFailures = results.filter(r => !r.success); - if (executionFailures.length > 0) { - recommendations.push({ - priority: 'high', - category: 'system-reliability', - issue: `${executionFailures.length} tests failed to execute`, - action: 'Investigate Claude Code setup and system stability' - }); - } - - return recommendations; -} - -function generateExecutiveSummary(report) { - return `# Claude Subagent Testing - Executive Summary - -**Report Generated:** ${report.metadata.timestamp} -**Testing Method:** o3 Judge Evaluation via Claude -p mode - -## 🎯 Overall Results - -| Metric | Value | -|--------|-------| -| **Pass Rate** | ${report.summary.passRate}% (${report.summary.passedTests}/${report.summary.totalTests}) | -| **Average Score** | ${report.summary.averageScore}/100 | -| **Status** | ${report.summary.passRate >= 80 ? '🟒 Excellent' : report.summary.passRate >= 60 ? '🟑 Good' : 'πŸ”΄ Needs Improvement'} | - -## πŸ“Š BMAD Integration Analysis - -${Object.entries(report.summary.bmadIntegrationAnalysis) - .map(([level, count]) => `- **${level}**: ${count} tests`) - .join('\n')} - -## 🎭 Agent Performance - -${report.detailedResults.map(r => - `### ${r.testId} (${r.targetAgent}) -- **Score:** ${r.o3Evaluation.overallScore}/100 ${r.o3Evaluation.passes ? 'βœ…' : '❌'} -- **BMAD Integration:** ${r.o3Evaluation.bmadIntegrationLevel} -- **Key Strengths:** ${r.o3Evaluation.strengths.join(', ')} -- **Areas for Improvement:** ${r.o3Evaluation.weaknesses.join(', ')}` -).join('\n\n')} - -## πŸš€ Recommendations - -${report.recommendations.map(rec => - `### ${rec.priority.toUpperCase()} Priority: ${rec.category} -**Issue:** ${rec.issue} -**Action:** ${rec.action}` -).join('\n\n')} - -## πŸŽ‰ Conclusion - -${report.summary.passRate >= 80 - ? 'Excellent performance! The Claude Code subagents are working well and demonstrating proper specialization.' - : report.summary.passRate >= 60 - ? 'Good foundation with room for improvement. Focus on the high-priority recommendations.' - : 'Significant improvements needed. Review agent configurations and prompts.'} - ---- -*Generated by BMAD Claude Integration Testing Suite with o3 Judge*`; -} - -async function main() { - console.log('πŸš€ Claude Subagent Testing with o3 Judge'); - console.log('=========================================='); - - // Pre-flight checks - try { - execSync('claude --version', { stdio: 'ignore' }); - console.log('βœ… Claude Code detected'); - } catch { - console.error('❌ Claude Code not found. Install from https://claude.ai/code'); - process.exit(1); - } - - const agentsDir = path.join(REPO_ROOT, '.claude/agents'); - if (!await fs.pathExists(agentsDir)) { - console.error('❌ No Claude agents found. Run: npm run build:claude'); - process.exit(1); - } - - console.log(`βœ… Testing ${TEST_CASES.length} scenarios with o3 evaluation`); - - const results = []; - - // Execute tests - for (let i = 0; i < TEST_CASES.length; i++) { - const testCase = TEST_CASES[i]; - console.log(`\n[${i + 1}/${TEST_CASES.length}] Testing ${testCase.id}...`); - - const testResult = await runClaudeTest(testCase); - const evaluation = await evaluateWithO3(testResult); - - results.push({ - testCase, - success: testResult.success, - output: testResult.output, - error: testResult.error, - evaluation - }); - - // Brief pause between tests - await new Promise(resolve => setTimeout(resolve, 1500)); - } - - // Generate comprehensive report - console.log('\nπŸ“Š Generating detailed report with o3 analysis...'); - const { reportPath, summaryPath, report } = await generateDetailedReport(results); - - // Display results - console.log('\nπŸŽ‰ Testing Complete!'); - console.log('===================='); - console.log(`πŸ“ˆ Pass Rate: ${report.summary.passRate}% (${report.summary.passedTests}/${report.summary.totalTests})`); - console.log(`πŸ“Š Average Score: ${report.summary.averageScore}/100`); - console.log(`πŸ”— BMAD Integration: ${JSON.stringify(report.summary.bmadIntegrationAnalysis)}`); - - console.log(`\nπŸ“„ Detailed Report: ${reportPath}`); - console.log(`πŸ“‹ Executive Summary: ${summaryPath}`); - - if (report.summary.passRate >= 80) { - console.log('\n🎊 Outstanding! Claude subagents are performing excellently!'); - } else if (report.summary.passRate >= 60) { - console.log('\nβœ… Good progress! Review recommendations for improvements.'); - } else { - console.log('\n⚠️ Significant issues detected. Please review the detailed analysis.'); - } - - process.exit(report.summary.passRate >= 70 ? 0 : 1); -} - -if (import.meta.url === `file://${process.argv[1]}`) { - main().catch(error => { - console.error(`❌ Test suite failed: ${error.message}`); - process.exit(1); - }); -} diff --git a/package.json b/package.json index 4686356a..c16882c9 100644 --- a/package.json +++ b/package.json @@ -11,8 +11,6 @@ "build": "node tools/cli.js build", "build:agents": "node tools/cli.js build --agents-only", "build:teams": "node tools/cli.js build --teams-only", - "build:claude": "cd integration/claude && npm install && npm run build", - "test:claude": "./integration/claude/quick-start-test.sh", "list:agents": "node tools/cli.js list:agents", "validate": "node tools/cli.js validate", "install:bmad": "node tools/installer/bin/bmad.js install",