Revert "feat(integration): claude code subagents"

This reverts commit b8709a6af2.
This commit is contained in:
Basit Mustafa 2025-07-25 15:13:37 -07:00
parent 8860dfe04e
commit 09cbfabfa0
24 changed files with 0 additions and 3383 deletions

View File

@ -1,28 +0,0 @@
# BMad-Method Agent Guide
## Build Commands
- `npm run build` - Build all agents and teams
- `npm run build:agents` - Build only agent bundles
- `npm run build:teams` - Build only team bundles
- `npm run validate` - Validate configuration and files
- `npm run format` - Format all Markdown files with Prettier
- `node tools/cli.js list:agents` - List available agents
## Test Commands
- No formal test suite - validation via `npm run validate`
- Manual testing via building agents/teams and checking outputs
## Architecture
- **Core**: `bmad-core/` - Agent definitions, templates, workflows, user guide
- **Tools**: `tools/` - CLI build system, installers, web builders
- **Expansion Packs**: `expansion-packs/` - Domain-specific agent collections
- **Distribution**: `dist/` - Built agent/team bundles for web deployment
- **Config**: `bmad-core/core-config.yaml` - Sharding, paths, markdown settings
## Code Style
- **Modules**: CommonJS (`require`/`module.exports`), some ES modules via dynamic import
- **Classes**: PascalCase (WebBuilder), methods camelCase (buildAgents)
- **Files**: kebab-case (web-builder.js), constants UPPER_CASE
- **Error Handling**: Try-catch with graceful fallback, async/await patterns
- **Imports**: Node built-ins, fs-extra, chalk, commander, js-yaml
- **Paths**: Always use `path.join()`, absolute paths via `path.resolve()`

View File

@ -25,29 +25,6 @@ This two-phase approach eliminates both **planning inconsistency** and **context
**📖 [See the complete workflow in the User Guide](bmad-core/user-guide.md)** - Planning phase, development cycle, and all agent roles **📖 [See the complete workflow in the User Guide](bmad-core/user-guide.md)** - Planning phase, development cycle, and all agent roles
## 🆕 Claude Code Integration (Alpha at best)
**NEW:** A contribution is attempting to integrate BMad-Method with [Claude Code's new subagents released 7/24](https://docs.anthropic.com/en/docs/claude-code/sub-agents)! Transform BMad's agents into native Claude Code subagents for seamless AI-powered development.
**⚠️ This is an alpha feature, and may not work as expected.** In fact I know it doesn't fully work but subagents were just released a few hours before I finished the initial cut here, so please open defects against [BMAD-AT-CLAUDE](https://github.com/24601/BMAD-AT-CLAUDE/issues).
There are a few enhancements I have attempted to make to make the flow/DX of using BMAD-METHOD with Claude Code subagents more seamless:
- Shared scratchpad for handoffs
- Use the `description` facility to provide semantic meaning to claude for auto-call agents appropriately
- Memory priming
- Data sourcing helper
```bash
# Generate Claude Code subagents
npm run build:claude
# Start Claude Code
claude
```
**[📖 Complete Claude Integration Guide](docs/claude-integration.md)** - Setup, usage, and workflows
## Quick Navigation ## Quick Navigation
### Understanding the BMad Workflow ### Understanding the BMad Workflow

View File

@ -39,14 +39,7 @@ persona:
style: Analytical, inquisitive, creative, facilitative, objective, data-informed style: Analytical, inquisitive, creative, facilitative, objective, data-informed
identity: Strategic analyst specializing in brainstorming, market research, competitive analysis, and project briefing identity: Strategic analyst specializing in brainstorming, market research, competitive analysis, and project briefing
focus: Research planning, ideation facilitation, strategic analysis, actionable insights focus: Research planning, ideation facilitation, strategic analysis, actionable insights
methodology: |
HYPOTHESIS-DRIVEN ANALYSIS FRAMEWORK:
Step 1: Formulate Hypotheses - Start every analysis by stating 2-3 testable hypotheses about the market/problem
Step 2: Gather Evidence - Collect quantitative data and qualitative sources to validate/refute each hypothesis
Step 3: Validate & Score - Rate each hypothesis (High/Medium/Low confidence) based on evidence strength
Step 4: Synthesize Insights - Transform validated hypotheses into actionable strategic recommendations
core_principles: core_principles:
- Hypothesis-First Approach - Begin analysis with explicit testable assumptions
- Curiosity-Driven Inquiry - Ask probing "why" questions to uncover underlying truths - Curiosity-Driven Inquiry - Ask probing "why" questions to uncover underlying truths
- Objective & Evidence-Based Analysis - Ground findings in verifiable data and credible sources - Objective & Evidence-Based Analysis - Ground findings in verifiable data and credible sources
- Strategic Contextualization - Frame all work within broader strategic context - Strategic Contextualization - Frame all work within broader strategic context

View File

@ -1,11 +0,0 @@
Company,Tool,Users_Millions,Revenue_Millions_USD,AI_Features,Market_Share_Percent,Founded
Atlassian,Jira,200,3000,Intelligence,15.2,2002
Monday.com,Monday,180,900,AI Assistant,12.8,2012
Asana,Asana,145,455,Intelligence,10.1,2008
Microsoft,Project,120,2100,Copilot,8.7,1984
Smartsheet,Smartsheet,82,740,DataMesh,5.9,2005
Notion,Notion,35,275,AI Writing,2.5,2016
ClickUp,ClickUp,25,156,ClickUp AI,1.8,2017
Linear,Linear,8,45,Predictive,0.6,2019
Airtable,Airtable,65,735,Apps,4.7,2012
Basecamp,Basecamp,15,99,Limited,1.1,1999
1 Company Tool Users_Millions Revenue_Millions_USD AI_Features Market_Share_Percent Founded
2 Atlassian Jira 200 3000 Intelligence 15.2 2002
3 Monday.com Monday 180 900 AI Assistant 12.8 2012
4 Asana Asana 145 455 Intelligence 10.1 2008
5 Microsoft Project 120 2100 Copilot 8.7 1984
6 Smartsheet Smartsheet 82 740 DataMesh 5.9 2005
7 Notion Notion 35 275 AI Writing 2.5 2016
8 ClickUp ClickUp 25 156 ClickUp AI 1.8 2017
9 Linear Linear 8 45 Predictive 0.6 2019
10 Airtable Airtable 65 735 Apps 4.7 2012
11 Basecamp Basecamp 15 99 Limited 1.1 1999

View File

@ -1,90 +0,0 @@
# Fintech Compliance and Regulatory Guidelines
## PCI DSS Compliance
### Level 1 Requirements (>6M transactions/year)
- **Network Security**: Firewall, network segmentation
- **Data Protection**: Encrypt cardholder data, mask PAN
- **Access Control**: Unique IDs, two-factor authentication
- **Monitoring**: Log access, file integrity monitoring
- **Testing**: Vulnerability scanning, penetration testing
- **Policies**: Information security policy, incident response
### Implementation Checklist
- [ ] Tokenize card data, never store full PAN
- [ ] Use validated payment processors (Stripe, Square)
- [ ] Implement Point-to-Point Encryption (P2PE)
- [ ] Regular security assessments and audits
- [ ] Staff training on data handling procedures
## SOX Compliance (Public Companies)
### Key Controls
- **ITGC**: IT General Controls for financial systems
- **Change Management**: Documented approval processes
- **Access Reviews**: Quarterly user access audits
- **Segregation of Duties**: Separate authorization/recording
- **Documentation**: Maintain audit trails and evidence
## GDPR/Privacy Regulations
### Data Processing Requirements
- **Lawful Basis**: Consent, contract, legitimate interest
- **Data Minimization**: Collect only necessary data
- **Purpose Limitation**: Use data only for stated purposes
- **Retention Limits**: Delete data when no longer needed
- **Data Subject Rights**: Access, rectification, erasure, portability
### Technical Safeguards
- **Privacy by Design**: Build privacy into system architecture
- **Encryption**: End-to-end encryption for personal data
- **Pseudonymization**: Replace identifiers with artificial ones
- **Data Loss Prevention**: Monitor and prevent unauthorized access
## Banking Regulations
### Open Banking (PSD2)
- **Strong Customer Authentication**: Multi-factor authentication
- **API Security**: OAuth 2.0, mutual TLS, certificate validation
- **Data Sharing**: Consent management, scope limitation
- **Fraud Prevention**: Real-time monitoring, risk scoring
### Anti-Money Laundering (AML)
- **Customer Due Diligence**: Identity verification, risk assessment
- **Transaction Monitoring**: Unusual pattern detection
- **Suspicious Activity Reporting**: Automated SAR generation
- **Record Keeping**: 5-year transaction history retention
## Testing Requirements
### Compliance Testing
- **Penetration Testing**: Annual external security assessments
- **Vulnerability Scanning**: Quarterly automated scans
- **Code Reviews**: Security-focused static analysis
- **Red Team Exercises**: Simulated attack scenarios
### Audit Preparation
- **Documentation**: Policies, procedures, evidence collection
- **Control Testing**: Validate effectiveness of security controls
- **Gap Analysis**: Identify compliance deficiencies
- **Remediation Planning**: Prioritize and track fixes
## Regional Considerations
### United States
- **CCPA**: California Consumer Privacy Act requirements
- **GLBA**: Gramm-Leach-Bliley Act for financial institutions
- **FFIEC**: Federal guidance for IT risk management
- **State Regulations**: Additional requirements by state
### European Union
- **PSD2**: Payment Services Directive
- **GDPR**: General Data Protection Regulation
- **MiFID II**: Markets in Financial Instruments Directive
- **EBA Guidelines**: European Banking Authority standards
### Asia-Pacific
- **PDPA**: Personal Data Protection Acts (Singapore, Thailand)
- **Privacy Act**: Australia's privacy legislation
- **PIPEDA**: Canada's Personal Information Protection
- **Local Banking**: Country-specific financial regulations

View File

@ -1,11 +0,0 @@
Market,Size_USD_Billions,Growth_Rate_CAGR,Year,Source
Project Management Software,7.8,10.1%,2023,Grand View Research
AI-Powered PM Tools,1.2,24.3%,2023,TechNavio
Agile Development Tools,2.1,15.7%,2023,Mordor Intelligence
Customer Support Software,24.5,12.2%,2023,Fortune Business Insights
Collaboration Software,31.2,9.5%,2023,Allied Market Research
DevOps Tools,8.9,18.4%,2023,Global Market Insights
SaaS Project Management,4.5,11.8%,2023,ResearchAndMarkets
Enterprise PM Solutions,6.2,8.9%,2023,MarketsandMarkets
Mobile PM Apps,1.8,16.2%,2023,IBISWorld
Cloud PM Platforms,5.4,13.1%,2023,Verified Market Research
1 Market Size_USD_Billions Growth_Rate_CAGR Year Source
2 Project Management Software 7.8 10.1% 2023 Grand View Research
3 AI-Powered PM Tools 1.2 24.3% 2023 TechNavio
4 Agile Development Tools 2.1 15.7% 2023 Mordor Intelligence
5 Customer Support Software 24.5 12.2% 2023 Fortune Business Insights
6 Collaboration Software 31.2 9.5% 2023 Allied Market Research
7 DevOps Tools 8.9 18.4% 2023 Global Market Insights
8 SaaS Project Management 4.5 11.8% 2023 ResearchAndMarkets
9 Enterprise PM Solutions 6.2 8.9% 2023 MarketsandMarkets
10 Mobile PM Apps 1.8 16.2% 2023 IBISWorld
11 Cloud PM Platforms 5.4 13.1% 2023 Verified Market Research

View File

@ -1,62 +0,0 @@
# Security Patterns and Best Practices
## Authentication & Authorization
### JWT Best Practices
- **Expiry**: Access tokens 15-30 minutes, refresh tokens 7-30 days
- **Algorithm**: Use RS256 for public/private key signing
- **Claims**: Include minimal necessary data (user_id, roles, exp)
- **Storage**: HttpOnly cookies for web, secure storage for mobile
- **Validation**: Always verify signature, expiry, and issuer
### OAuth 2.0 Implementation
- **PKCE**: Required for all public clients (SPAs, mobile)
- **State Parameter**: Prevent CSRF attacks
- **Scope Limitation**: Request minimal necessary permissions
- **Redirect URI**: Exact match validation, no wildcards
## Data Protection
### Encryption Standards
- **At Rest**: AES-256-GCM for data, RSA-4096 for keys
- **In Transit**: TLS 1.3 minimum, certificate pinning for mobile
- **Database**: Column-level encryption for PII
- **Backups**: Encrypted with separate key management
### Input Validation
- **Sanitization**: Use parameterized queries, escape HTML
- **File Uploads**: MIME type validation, virus scanning, size limits
- **Rate Limiting**: Per-IP, per-user, per-endpoint limits
- **Schema Validation**: JSON Schema or similar for API inputs
## API Security
### Common Vulnerabilities
1. **Injection**: SQL, NoSQL, Command, LDAP injection
2. **Broken Authentication**: Weak passwords, exposed credentials
3. **Sensitive Data Exposure**: Logs, error messages, debug info
4. **XML External Entities**: XXE attacks in XML processing
5. **Broken Access Control**: Privilege escalation, IDOR
### Security Headers
```
Content-Security-Policy: default-src 'self'
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000
Referrer-Policy: strict-origin-when-cross-origin
```
## Monitoring & Incident Response
### Security Logging
- **Authentication Events**: Login attempts, failures, lockouts
- **Authorization**: Access grants/denials, privilege changes
- **Data Access**: PII access, export operations
- **System Changes**: Configuration updates, user modifications
### Threat Detection
- **Anomaly Detection**: Unusual access patterns, location changes
- **Automated Response**: Account lockout, IP blocking
- **Alert Thresholds**: Failed login attempts, API rate violations
- **SIEM Integration**: Centralized log analysis and correlation

View File

@ -1,263 +0,0 @@
# BMAD-Method Claude Code Integration
This document describes the Claude Code subagents integration for BMAD-Method, allowing you to use BMAD's specialized agents within Claude Code's new subagent system.
## Overview
The Claude Code integration transforms BMAD's collaborative agent framework into Claude Code subagents while maintaining clean separation from the original codebase. This enables:
- **Native Claude Code Experience**: Use BMAD agents directly within Claude Code
- **Context Management**: Each agent maintains its own context window
- **Tool Integration**: Leverage Claude Code's built-in tools (Read, Grep, codebase_search, etc.)
- **Workflow Preservation**: Maintain BMAD's proven agent collaboration patterns
## Quick Setup
### 1. Prerequisites
- Node.js 20+
- Claude Code installed ([claude.ai/code](https://claude.ai/code))
- Existing BMAD-Method project
### 2. Generate Claude Subagents
```bash
# From your BMAD project root
npm run build:claude
```
This creates `.claude/agents/` with six specialized subagents:
- **Analyst** (Mary) - Market research, competitive analysis, project briefs
- **Architect** - System design, technical architecture
- **PM** - Project management, planning, coordination
- **Dev** - Development, implementation, coding
- **QA** - Quality assurance, testing, validation
- **Scrum Master** - Agile process management
### 3. Start Claude Code
```bash
# In your project root (where .claude/ directory exists)
claude
```
## Usage Patterns
### Explicit Agent Invocation
Request specific agents for specialized tasks:
```
# Market research and analysis
> Use the analyst subagent to help me create a competitive analysis
# Architecture planning
> Ask the architect subagent to design a microservices architecture
# Implementation
> Have the dev subagent implement the user authentication system
# Quality assurance
> Use the qa subagent to create comprehensive test cases
```
### Automatic Agent Selection
Claude Code automatically selects appropriate agents based on context:
```
# Analyst will likely be chosen
> I need to research the market for AI-powered project management tools
# Architect will likely be chosen
> How should I structure the database schema for this multi-tenant SaaS?
# Dev will likely be chosen
> Implement the JWT authentication middleware
```
## Agent Capabilities
### Analyst (Mary) 📊
- Market research and competitive analysis
- Project briefs and discovery documentation
- Brainstorming and ideation facilitation
- Strategic analysis and insights
**Key Commands**: create-project-brief, perform-market-research, create-competitor-analysis, brainstorm
### Architect 🏗️
- System architecture and design
- Technical solution planning
- Integration patterns and approaches
- Scalability and performance considerations
### PM 📋
- Project planning and coordination
- Stakeholder management
- Risk assessment and mitigation
- Resource allocation and timeline management
### Dev 👨‍💻
- Code implementation and development
- Technical problem solving
- Code review and optimization
- Integration and deployment
### QA 🔍
- Test planning and execution
- Quality assurance processes
- Bug identification and validation
- Acceptance criteria definition
### Scrum Master 🎯
- Sprint planning and management
- Agile process facilitation
- Team coordination and communication
- Impediment resolution
## Workflow Integration
### BMAD Story-Driven Development
Agents can access and work with BMAD story files:
```
> Use the dev subagent to implement the user story in stories/user-auth.story.md
```
### Task and Template Access
Agents can read BMAD dependencies:
```
> Have the analyst use the project-brief template to document our new feature
```
### Cross-Agent Collaboration
Chain agents for complex workflows:
```
> First use the analyst to research the market, then have the architect design the solution, and finally ask the pm to create a project plan
```
## Technical Architecture
### Directory Structure
```
./
├── bmad-core/ # Original BMAD (untouched)
├── integration/claude/ # Claude integration source
└── .claude/ # Generated Claude subagents
├── agents/ # Subagent definitions
│ ├── analyst.md
│ ├── architect.md
│ └── ...
└── memory/ # Agent context memory
```
### Context Management
- **Lightweight Start**: Each agent begins with minimal context (~2-4KB)
- **On-Demand Loading**: Agents use tools to read files when needed
- **Memory Files**: Rolling memory maintains conversation context
- **Tool Integration**: Access BMAD files via Read, Grep, codebase_search
### Tool Permissions
Each agent has access to:
- `Read` - File reading and content access
- `Grep` - Text search within files
- `glob` - File pattern matching
- `codebase_search_agent` - Semantic code search
- `list_directory` - Directory exploration
## Advanced Usage
### Custom Agent Development
To add new agents:
1. Create agent definition in `bmad-core/agents/new-agent.md`
2. Add agent ID to `integration/claude/src/build-claude.js`
3. Rebuild: `npm run build:claude`
### Memory Management
Agents maintain context in `.claude/memory/{agent}.md`:
- Automatically created on first use
- Stores key decisions and context
- Truncated when exceeding limits
- Can be manually edited if needed
### Integration with CI/CD
```yaml
# .github/workflows/claude-agents.yml
name: Update Claude Agents
on:
push:
paths: ['bmad-core/agents/**']
jobs:
build-claude:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm run build:claude
- # Commit updated .claude/ directory
```
## Best Practices
### Agent Selection
- **Analyst**: Early project phases, research, market analysis
- **Architect**: System design, technical planning, solution architecture
- **PM**: Project coordination, planning, stakeholder management
- **Dev**: Implementation, coding, technical execution
- **QA**: Testing, validation, quality assurance
- **Scrum Master**: Process management, team coordination
### Context Optimization
- Start conversations with clear agent requests
- Reference specific BMAD files by path when needed
- Use agent memory files for important decisions
- Keep agent contexts focused on their specialization
### Workflow Efficiency
- Use explicit agent invocation for specialized tasks
- Chain agents for multi-phase work
- Leverage BMAD story files for development context
- Maintain conversation history in agent memory
## Troubleshooting
### Agent Not Found
```bash
# Rebuild agents
npm run build:claude
# Verify generation
ls .claude/agents/
```
### Memory Issues
```bash
# Clear agent memory
rm .claude/memory/*.md
```
### Context Problems
- Keep agent prompts focused
- Use tools to load files on-demand
- Reference specific sections rather than entire documents
## Support
- **BMAD Community**: [Discord](https://discord.gg/gk8jAdXWmj)
- **Issues**: [GitHub Issues](https://github.com/24601/BMAD-AT-CLAUDE/issues)
- **Claude Code Docs**: [docs.anthropic.com/claude-code](https://docs.anthropic.com/en/docs/claude-code/overview)

View File

@ -1,13 +0,0 @@
# Dependencies
node_modules/
npm-debug.log*
yarn-debug.log*
yarn-error.log*
# Runtime data
*.pid
*.seed
*.log
# Generated files should be in root .claude/, not here
.claude/

View File

@ -1,105 +0,0 @@
# BMAD-Method Claude Code Integration
This directory contains the integration layer that ports BMAD-Method agents to Claude Code's subagent system.
## Quick Start
```bash
# Build Claude Code subagents from BMAD definitions
npm run build
# Start Claude Code in the repo root
cd ../../
claude
```
## What This Does
This integration transforms BMAD-Method's specialized agents into Claude Code subagents:
- **Analyst (Mary)** - Market research, brainstorming, competitive analysis, project briefs
- **Architect** - System design, technical architecture, solution planning
- **PM** - Project management, planning, coordination
- **Dev** - Development, implementation, coding
- **QA** - Quality assurance, testing, validation
- **Scrum Master** - Agile process management, team coordination
## How It Works
1. **Agent Parsing**: Reads BMAD agent definitions from `bmad-core/agents/`
2. **Template Generation**: Uses Mustache templates to create Claude subagent files
3. **Context Management**: Creates lightweight memory files for each agent
4. **Tool Assignment**: Grants appropriate tools (Read, Grep, codebase_search, etc.)
## Generated Structure
```
.claude/
├── agents/ # Generated subagent definitions
│ ├── analyst.md
│ ├── architect.md
│ ├── dev.md
│ ├── pm.md
│ ├── qa.md
│ └── sm.md
└── memory/ # Context memory for each agent
├── analyst.md
└── ...
```
## Usage in Claude Code
Once built, you can use subagents in Claude Code:
```
# Explicit invocation
> Use the analyst subagent to help me create a project brief
# Or let Claude choose automatically
> I need help with market research and competitive analysis
```
## Architecture Principles
- **Zero Pollution**: No changes to original BMAD structure
- **One-Way Generation**: Claude agents generated from BMAD sources
- **Context Light**: Each agent starts with minimal context, loads more on-demand
- **Tool Focused**: Uses Claude Code's built-in tools for file access
## Development
### Building
```bash
npm run build # Build all agents
npm run clean # Remove generated .claude directory
npm run validate # Validate agent definitions
```
### Templates
Agent templates are in `src/templates/agent.mustache` and use the following data:
- `agent.*` - Agent metadata (name, title, icon, etc.)
- `persona.*` - Role definition and principles
- `commands` - Available BMAD commands
- `dependencies.*` - Task, template, and data dependencies
### Adding New Agents
1. Add agent ID to `CORE_AGENTS` array in `build-claude.js`
2. Ensure corresponding `.md` file exists in `bmad-core/agents/`
3. Run `npm run build`
## Integration with Original BMAD
This integration is designed to coexist with the original BMAD system:
- Original BMAD web bundles continue to work unchanged
- Claude integration is completely optional
- No modification to core BMAD files required
- Can be used alongside existing BMAD workflows
## License
MIT - Same as BMAD-Method

View File

@ -1,437 +0,0 @@
# End-to-End Testing Guide for BMAD Claude Integration
This guide provides comprehensive testing scenarios to validate the Claude Code subagents integration.
## Test Environment Setup
### 1. Create Fresh Test Project
```bash
# Create new test directory
mkdir ~/bmad-claude-test
cd ~/bmad-claude-test
# Initialize basic project structure
mkdir -p src docs tests
echo "# Test Project for BMAD Claude Integration" > README.md
# Clone BMAD method (or copy existing)
git clone https://github.com/24601/BMAD-AT-CLAUDE.git
cd BMAD-AT-CLAUDE
# Install dependencies and build Claude agents
npm install
npm run build:claude
```
### 2. Verify Claude Code Installation
```bash
# Check Claude Code is available
claude --version
# Verify we're in the right directory with .claude/agents/
ls -la .claude/agents/
```
### 3. Start Claude Code Session
```bash
# Start Claude Code in project root
claude
# Should show available subagents
/agents
```
## Core Agent Testing
### Test 1: Analyst Agent - Market Research
**Prompt:**
```
Use the analyst subagent to help me research the market for AI-powered project management tools. I want to understand the competitive landscape and identify key market gaps.
```
**Expected Behavior:**
- Agent introduces itself as Mary, Business Analyst
- Offers to use market research templates
- Accesses BMAD dependencies using Read tool
- Provides structured analysis approach
**Validation:**
- [ ] Agent stays in character as Mary
- [ ] References BMAD templates/tasks appropriately
- [ ] Uses numbered lists for options
- [ ] Accesses files via Read tool when needed
### Test 2: Architect Agent - System Design
**Prompt:**
```
Ask the architect subagent to design a microservices architecture for a multi-tenant SaaS platform with user authentication, billing, and analytics.
```
**Expected Behavior:**
- Agent focuses on technical architecture
- Considers scalability and system boundaries
- May reference BMAD architecture templates
- Provides detailed technical recommendations
**Validation:**
- [ ] Technical depth appropriate for architect role
- [ ] System thinking and architectural patterns
- [ ] References to BMAD resources when relevant
### Test 3: Dev Agent - Implementation
**Prompt:**
```
Have the dev subagent implement a JWT authentication middleware in Node.js with proper error handling and logging.
```
**Expected Behavior:**
- Focuses on practical implementation
- Writes actual code
- Considers best practices and error handling
- May suggest testing approaches
**Validation:**
- [ ] Produces working code
- [ ] Follows security best practices
- [ ] Includes proper error handling
## BMAD Integration Testing
### Test 4: Story File Workflow
**Setup:**
```bash
# Create a sample story file
mkdir -p stories
cat > stories/user-auth.story.md << 'EOF'
# User Authentication Story
## Overview
Implement secure user authentication system with JWT tokens.
## Acceptance Criteria
- [ ] User can register with email/password
- [ ] User can login and receive JWT token
- [ ] Protected routes require valid token
- [ ] Token refresh mechanism
## Technical Notes
- Use bcrypt for password hashing
- JWT expiry: 15 minutes
- Refresh token expiry: 7 days
EOF
```
**Prompt:**
```
Use the dev subagent to implement the user authentication story in stories/user-auth.story.md. Follow the acceptance criteria exactly.
```
**Expected Behavior:**
- Agent reads the story file using Read tool
- Implements according to acceptance criteria
- References story context throughout implementation
**Validation:**
- [ ] Agent reads story file correctly
- [ ] Implementation matches acceptance criteria
- [ ] Maintains story context during conversation
### Test 5: BMAD Template Usage
**Prompt:**
```
Use the analyst subagent to create a project brief using the BMAD project-brief template for an AI-powered customer support chatbot.
```
**Expected Behavior:**
- Agent accesses BMAD templates using Read tool
- Uses project-brief-tmpl.yaml structure
- Guides user through template completion
- Follows BMAD workflow patterns
**Validation:**
- [ ] Accesses correct template file
- [ ] Follows template structure
- [ ] Maintains BMAD methodology
## Agent Collaboration Testing
### Test 6: Multi-Agent Workflow
**Prompt:**
```
I want to build a new feature for real-time notifications. First use the analyst to research notification patterns, then have the architect design the system, and finally ask the pm to create a project plan.
```
**Expected Behavior:**
- Sequential agent handoffs
- Each agent maintains context from previous work
- Cross-references between agent outputs
- Coherent end-to-end workflow
**Validation:**
- [ ] Smooth agent transitions
- [ ] Context preservation across agents
- [ ] Workflow coherence
- [ ] Each agent stays in character
### Test 7: Agent Memory Persistence
**Setup:**
```bash
# Start conversation with analyst
# Make some decisions and progress
# Exit and restart Claude Code session
```
**Test:**
1. Have conversation with analyst about market research
2. Exit Claude Code
3. Restart Claude Code
4. Continue conversation - check if context preserved
**Expected Behavior:**
- Agent memory files store key decisions
- Context partially preserved across sessions
- Agent references previous conversation appropriately
## Error Handling and Edge Cases
### Test 8: Invalid File Access
**Prompt:**
```
Use the analyst subagent to read the file bmad-core/nonexistent-file.md
```
**Expected Behavior:**
- Graceful error handling
- Suggests alternative files or approaches
- Maintains agent persona during error
**Validation:**
- [ ] No crashes or errors
- [ ] Helpful error messages
- [ ] Agent stays in character
### Test 9: Tool Permission Testing
**Prompt:**
```
Use the dev subagent to create a new file in the src/ directory with a sample API endpoint.
```
**Expected Behavior:**
- Agent attempts to use available tools
- If create_file not available, suggests alternatives
- Provides code that could be manually created
**Validation:**
- [ ] Respects tool limitations
- [ ] Provides alternatives when tools unavailable
- [ ] Clear about what actions are possible
### Test 10: Context Window Management
**Setup:**
```bash
# Create large content files to test context limits
mkdir -p test-content
for i in {1..50}; do
echo "This is test content line $i with enough text to make it substantial and test context window management capabilities. Adding more text to make each line longer and test how agents handle large content volumes." >> test-content/large-file.md
done
```
**Prompt:**
```
Use the analyst subagent to analyze all the content in the test-content/ directory and summarize the key insights.
```
**Expected Behavior:**
- Agent uses tools to access content incrementally
- Doesn't load everything into context at once
- Provides meaningful analysis despite size constraints
**Validation:**
- [ ] Efficient tool usage
- [ ] No context overflow errors
- [ ] Meaningful output despite constraints
## Performance and Usability Testing
### Test 11: Response Time
**Test Multiple Prompts:**
- Time each agent invocation
- Measure response quality vs speed
- Test with different complexity levels
**Metrics:**
- [ ] Initial agent load time < 10 seconds
- [ ] Subsequent responses < 30 seconds
- [ ] Quality maintained across response times
### Test 12: User Experience
**Prompts to Test:**
```
# Ambiguous request
> Help me with my project
# Complex multi-step request
> I need to build a complete authentication system from scratch
# Domain-specific request
> Create unit tests for my React components
```
**Expected Behavior:**
- Appropriate agent selection or clarification requests
- Clear guidance on next steps
- Professional communication
**Validation:**
- [ ] Appropriate agent routing
- [ ] Clear communication
- [ ] Helpful responses to ambiguous requests
## Validation Checklist
### Agent Behavior ✅
- [ ] Each agent maintains distinct persona
- [ ] Agents stay in character throughout conversations
- [ ] Appropriate expertise demonstrated
- [ ] BMAD methodology preserved
### Tool Integration ✅
- [ ] Read tool accesses BMAD files correctly
- [ ] Grep searches work across codebase
- [ ] codebase_search_agent provides relevant results
- [ ] File paths resolved correctly
### Context Management ✅
- [ ] Agents start with minimal context
- [ ] On-demand loading works properly
- [ ] Memory files created and maintained
- [ ] No context overflow errors
### BMAD Integration ✅
- [ ] Original BMAD workflows preserved
- [ ] Templates and tasks accessible
- [ ] Story-driven development supported
- [ ] Cross-agent collaboration maintained
### Error Handling ✅
- [ ] Graceful handling of missing files
- [ ] Clear error messages
- [ ] Recovery suggestions provided
- [ ] No system crashes
## Automated Testing Script
```bash
#!/bin/bash
# automated-test.sh
echo "🚀 Starting BMAD Claude Integration Tests..."
# Test 1: Build verification
echo "📋 Test 1: Build verification"
npm run build:claude
if [ $? -eq 0 ]; then
echo "✅ Build successful"
else
echo "❌ Build failed"
exit 1
fi
# Test 2: Agent file validation
echo "📋 Test 2: Agent file validation"
cd integration/claude
npm run validate
if [ $? -eq 0 ]; then
echo "✅ Validation successful"
else
echo "❌ Validation failed"
exit 1
fi
# Test 3: File structure verification
echo "📋 Test 3: File structure verification"
cd ../..
required_files=(
".claude/agents/analyst.md"
".claude/agents/architect.md"
".claude/agents/dev.md"
".claude/agents/pm.md"
".claude/agents/qa.md"
".claude/agents/sm.md"
)
for file in "${required_files[@]}"; do
if [ -f "$file" ]; then
echo "✅ $file exists"
else
echo "❌ $file missing"
exit 1
fi
done
echo "🎉 All automated tests passed!"
echo "📝 Manual testing required for agent conversations"
```
## Manual Test Report Template
```markdown
# BMAD Claude Integration Test Report
**Date:** ___________
**Tester:** ___________
**Claude Code Version:** ___________
## Test Results Summary
- [ ] All agents load successfully
- [ ] Agent personas maintained
- [ ] BMAD integration working
- [ ] Tool access functional
- [ ] Error handling appropriate
## Detailed Results
### Agent Tests
- [ ] Analyst: ✅/❌ - Notes: ___________
- [ ] Architect: ✅/❌ - Notes: ___________
- [ ] Dev: ✅/❌ - Notes: ___________
- [ ] PM: ✅/❌ - Notes: ___________
- [ ] QA: ✅/❌ - Notes: ___________
- [ ] SM: ✅/❌ - Notes: ___________
### Integration Tests
- [ ] Story workflow: ✅/❌
- [ ] Template usage: ✅/❌
- [ ] Multi-agent flow: ✅/❌
### Issues Found
1. ___________
2. ___________
3. ___________
## Recommendations
___________
```
## Next Steps After Testing
1. **Fix Issues**: Address any problems found during testing
2. **Performance Optimization**: Improve response times if needed
3. **Documentation Updates**: Clarify usage based on test learnings
4. **User Feedback**: Gather feedback from real users
5. **Iteration**: Refine agents based on testing results

View File

@ -1,254 +0,0 @@
# Complete End-to-End Testing Framework with o3 Judge
Based on the Oracle's detailed evaluation, here's the comprehensive testing approach for validating the BMAD Claude integration.
## Testing Strategy Overview
1. **Manual Execution**: Run tests manually in Claude Code to avoid timeout issues
2. **Structured Collection**: Capture responses in standardized format
3. **o3 Evaluation**: Use Oracle tool for sophisticated analysis
4. **Iterative Improvement**: Apply recommendations to enhance integration
## Test Suite
### Core Agent Tests
#### 1. Analyst Agent - Market Research
**Prompt:**
```
Use the analyst subagent to help me research the competitive landscape for AI project management tools.
```
**Evaluation Criteria (from o3 analysis):**
- Subagent Persona (Mary, Business Analyst): 0-5 points
- Analytical Expertise/Market Research Method: 0-5 points
- BMAD Methodology Integration: 0-5 points
- Response Structure & Professionalism: 0-5 points
- User Engagement/Next-Step Clarity: 0-5 points
**Expected Improvements (per o3 recommendations):**
- [ ] References specific BMAD artefacts (Opportunity Scorecard, Gap Matrix)
- [ ] Includes quantitative analysis with data sources
- [ ] Shows hypothesis-driven discovery approach
- [ ] Solicits clarification on scope and constraints
#### 2. Dev Agent - Implementation Quality
**Prompt:**
```
Have the dev subagent implement a secure file upload endpoint in Node.js with validation, virus scanning, and rate limiting.
```
**Evaluation Criteria:**
- Technical Implementation Quality: 0-5 points
- Security Best Practices: 0-5 points
- Code Structure and Documentation: 0-5 points
- Error Handling and Validation: 0-5 points
- BMAD Story Integration: 0-5 points
#### 3. Architect Agent - System Design
**Prompt:**
```
Ask the architect subagent to design a microservices architecture for a real-time collaboration platform with document editing, user presence, and conflict resolution.
```
**Evaluation Criteria:**
- System Architecture Expertise: 0-5 points
- Scalability and Performance Considerations: 0-5 points
- Real-time Architecture Patterns: 0-5 points
- Technical Detail and Accuracy: 0-5 points
- Integration with BMAD Architecture Templates: 0-5 points
#### 4. PM Agent - Project Planning
**Prompt:**
```
Use the pm subagent to create a project plan for launching a new AI-powered feature, including team coordination, risk management, and stakeholder communication.
```
**Evaluation Criteria:**
- Project Management Methodology: 0-5 points
- Risk Assessment and Mitigation: 0-5 points
- Timeline and Resource Planning: 0-5 points
- Stakeholder Management: 0-5 points
- BMAD Process Integration: 0-5 points
#### 5. QA Agent - Testing Strategy
**Prompt:**
```
Ask the qa subagent to design a comprehensive testing strategy for a fintech payment processing system, including security, compliance, and performance testing.
```
**Evaluation Criteria:**
- Testing Methodology Depth: 0-5 points
- Domain-Specific Considerations (Fintech): 0-5 points
- Test Automation and CI/CD Integration: 0-5 points
- Quality Assurance Best Practices: 0-5 points
- BMAD QA Template Usage: 0-5 points
#### 6. Scrum Master Agent - Process Facilitation
**Prompt:**
```
Use the sm subagent to help establish an agile workflow for a remote team, including sprint ceremonies, collaboration tools, and team dynamics.
```
**Evaluation Criteria:**
- Agile Methodology Expertise: 0-5 points
- Remote Team Considerations: 0-5 points
- Process Facilitation Skills: 0-5 points
- Tool and Workflow Recommendations: 0-5 points
- BMAD Agile Integration: 0-5 points
### Advanced Integration Tests
#### 7. BMAD Story Workflow
**Setup:**
```bash
# Create sample story file
cat > stories/payment-integration.story.md << 'EOF'
# Payment Integration Story
## Overview
Integrate Stripe payment processing for subscription billing
## Acceptance Criteria
- [ ] Secure payment form with validation
- [ ] Subscription creation and management
- [ ] Webhook handling for payment events
- [ ] Error handling and retry logic
- [ ] Compliance with PCI DSS requirements
## Technical Notes
- Use Stripe SDK v3
- Implement idempotency keys
- Log all payment events for audit
EOF
```
**Test Prompt:**
```
Use the dev subagent to implement the payment integration story in stories/payment-integration.story.md
```
**Evaluation Focus:**
- Story comprehension and implementation
- Acceptance criteria coverage
- BMAD story-driven development adherence
#### 8. Cross-Agent Collaboration
**Test Sequence:**
```
1. "Use the analyst subagent to research payment processing competitors"
2. "Now ask the architect subagent to design a payment system based on the analysis"
3. "Have the pm subagent create an implementation plan for the payment system"
```
**Evaluation Focus:**
- Context handoff between agents
- Building on previous agent outputs
- Coherent multi-agent workflow
## Testing Execution Process
### Step 1: Manual Execution
```bash
# Build agents
npm run build:claude
# Start Claude Code
claude
# Run each test prompt and save responses
```
### Step 2: Response Collection
Create a structured record for each test:
```json
{
"testId": "analyst-market-research",
"timestamp": "2025-07-24T...",
"prompt": "Use the analyst subagent...",
"response": "Hello! I'm Mary...",
"executionNotes": "Agent responded immediately, showed subagent behavior",
"evidenceFound": [
"Agent identified as Mary",
"Referenced BMAD template",
"Structured analysis approach"
]
}
```
### Step 3: o3 Evaluation
For each response, use the Oracle tool with this evaluation template:
```
Evaluate this Claude Code subagent response using the detailed criteria framework established for BMAD integration testing.
TEST: {testId}
ORIGINAL PROMPT: {prompt}
RESPONSE: {response}
EVALUATION FRAMEWORK:
[Insert specific 5-point criteria for the agent type]
Based on the previous detailed evaluation of the analyst agent, please provide:
1. DETAILED SCORES: Rate each criterion 0-5 with justification
2. OVERALL PERCENTAGE: Calculate weighted average (max 100%)
3. STRENGTHS: What shows excellent subagent behavior?
4. IMPROVEMENT AREAS: What needs enhancement?
5. BMAD INTEGRATION LEVEL: none/basic/good/excellent
6. RECOMMENDATIONS: Specific improvements aligned with BMAD methodology
7. PASS/FAIL: Does this meet minimum subagent behavior threshold (70%)?
Format as structured analysis similar to the previous detailed evaluation.
```
### Step 4: Report Generation
#### Individual Test Reports
For each test, generate:
- Score breakdown by criteria
- Evidence of subagent behavior
- BMAD integration assessment
- Specific recommendations
#### Aggregate Analysis
- Overall pass rate across all agents
- BMAD integration maturity assessment
- Common strengths and improvement areas
- Integration readiness evaluation
## Success Criteria
### Minimum Viable Integration (70% threshold)
- [ ] Agents demonstrate distinct personas
- [ ] Responses show appropriate domain expertise
- [ ] Basic BMAD methodology references
- [ ] Professional response structure
- [ ] Clear user engagement
### Excellent Integration (85%+ threshold)
- [ ] Deep BMAD artifact integration
- [ ] Quantitative analysis with data sources
- [ ] Hypothesis-driven approach
- [ ] Sophisticated domain expertise
- [ ] Seamless cross-agent collaboration
## Continuous Improvement Process
1. **Run Full Test Suite** - Execute all 8 core tests
2. **Oracle Evaluation** - Get detailed o3 analysis for each
3. **Identify Patterns** - Find common improvement areas
4. **Update Agent Prompts** - Enhance based on recommendations
5. **Rebuild and Retest** - Verify improvements
6. **Document Learnings** - Update integration best practices
## Automation Opportunities
Once manual process is validated:
- Automated response collection via Claude API
- Batch o3 evaluation processing
- Regression testing on agent updates
- Performance benchmarking over time
This framework provides the sophisticated evaluation approach demonstrated by the Oracle's analysis while remaining practical for ongoing validation and improvement of the BMAD Claude integration.

View File

@ -1,115 +0,0 @@
# Manual Testing Guide with o3 Judge
Since automated Claude testing can be complex due to session management, here's a comprehensive manual testing approach with o3 evaluation.
## Quick Manual Test Process
### 1. Setup Test Environment
```bash
# Ensure agents are built
npm run build:claude
# Verify agent files exist
ls .claude/agents/
# Start Claude Code
claude
```
### 2. Test Each Agent Manually
Run these prompts in Claude Code and copy the responses for evaluation:
#### Test 1: Analyst Agent
**Prompt:**
```
Use the analyst subagent to help me research the competitive landscape for AI project management tools.
```
**Expected Behaviors:**
- Agent identifies as Mary or Business Analyst
- Shows analytical methodology or structured approach
- References market research or competitive analysis expertise
- May mention BMAD templates or systematic workflow
#### Test 2: Dev Agent
**Prompt:**
```
Have the dev subagent implement a JWT authentication middleware with error handling.
```
**Expected Behaviors:**
- Provides actual code implementation
- Shows development expertise and best practices
- Includes proper error handling approach
- Demonstrates security awareness for JWT
#### Test 3: Architect Agent
**Prompt:**
```
Ask the architect subagent to design a microservices architecture for real-time notifications.
```
**Expected Behaviors:**
- Shows system architecture expertise
- Discusses microservices patterns and boundaries
- Considers real-time and scalability concerns
- Demonstrates technical depth appropriate for architect role
### 3. Record Results
For each test, record:
- Did the response come from the intended subagent?
- What evidence shows proper agent behavior?
- How well does it integrate BMAD methodology?
- Quality and relevance of the response
## Example Test Result for o3 Evaluation
Here's a sample response to demonstrate the evaluation process:
**Test Response (Analyst Agent):**
```
Hello! I'm Mary, your Business Analyst, and I'm excited to help you research the competitive landscape for AI project management tools.
Let me approach this systematically using proven market research methodology:
## Research Approach
1. **Competitive Analysis Framework**
- Direct competitors (AI-first PM tools)
- Indirect competitors (traditional PM tools adding AI)
- Emerging players and startups
2. **Key Analysis Dimensions**
- Feature sets and AI capabilities
- Target markets and pricing
- User experience and adoption
- Technology stack and integrations
## Initial Competitive Landscape
### Direct AI-First Competitors
- **Monday.com AI**: Native AI features for task automation
- **Asana Intelligence**: AI-powered insights and automation
- **ClickUp AI**: AI writing assistant and task optimization
### Traditional Players with AI
- **Microsoft Project**: Adding Copilot integration
- **Atlassian (Jira)**: Intelligence features and automation
- **Notion**: AI writing and organization features
Would you like me to use the BMAD market research template to create a more detailed competitive analysis? I can help you:
1. Create a comprehensive competitor analysis document
2. Perform detailed feature mapping
3. Identify market gaps and opportunities
4. Research pricing and positioning strategies
Which aspect would you like to explore first?
```
## Using o3 Judge for Evaluation
I'll now use the Oracle (o3) to evaluate this sample response:

View File

@ -1,38 +0,0 @@
{
"name": "@bmad/claude-integration",
"version": "1.0.0",
"description": "Claude Code subagents integration for BMAD-Method",
"type": "module",
"scripts": {
"build": "node src/build-claude.js",
"build:agents": "node src/build-claude.js",
"clean": "rm -rf ../../.claude",
"validate": "node src/validate.js"
},
"dependencies": {
"mustache": "^4.2.0",
"yaml": "^2.3.4",
"fs-extra": "^11.2.0"
},
"devDependencies": {
"@types/node": "^20.0.0",
"typescript": "^5.0.0"
},
"peerDependencies": {
"bmad-method": "*"
},
"keywords": [
"bmad",
"claude",
"ai-agents",
"subagents",
"anthropic"
],
"author": "BMAD Community",
"license": "MIT",
"repository": {
"type": "git",
"url": "https://github.com/24601/BMAD-AT-CLAUDE.git",
"directory": "integration/claude"
}
}

View File

@ -1,147 +0,0 @@
#!/bin/bash
# Quick Start Test for BMAD Claude Integration
# Provides simple validation and setup for manual testing with o3 judge
echo "🚀 BMAD Claude Integration - Quick Start Test"
echo "============================================="
# Colors
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Change to repo root
cd "$(dirname "$0")/../.."
echo -e "${BLUE}📂 Working directory: $(pwd)${NC}"
echo ""
# Check prerequisites
echo "🔍 Checking prerequisites..."
# Check Node.js
if command -v node &> /dev/null; then
NODE_VERSION=$(node --version)
echo -e "${GREEN}✅ Node.js ${NODE_VERSION}${NC}"
else
echo -e "${RED}❌ Node.js not found${NC}"
exit 1
fi
# Check Claude Code
if command -v claude &> /dev/null; then
CLAUDE_VERSION=$(claude --version 2>&1 | head -1)
echo -e "${GREEN}✅ Claude Code detected${NC}"
else
echo -e "${YELLOW}⚠️ Claude Code not found${NC}"
echo " Install from: https://claude.ai/code"
fi
# Check if agents are built
if [ -d ".claude/agents" ]; then
AGENT_COUNT=$(ls .claude/agents/*.md 2>/dev/null | wc -l)
echo -e "${GREEN}✅ Found ${AGENT_COUNT} agent files${NC}"
else
echo -e "${YELLOW}⚠️ No agents found - building them now...${NC}"
npm run build:claude
if [ $? -eq 0 ]; then
echo -e "${GREEN}✅ Agents built successfully${NC}"
else
echo -e "${RED}❌ Failed to build agents${NC}"
exit 1
fi
fi
# Validate agent files
echo ""
echo "🔍 Validating agent configurations..."
cd integration/claude
npm run validate > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo -e "${GREEN}✅ All agent configurations valid${NC}"
else
echo -e "${YELLOW}⚠️ Agent validation warnings (check with: npm run validate)${NC}"
fi
cd ../..
# Show available agents
echo ""
echo "🎭 Available BMAD Agents:"
for agent in .claude/agents/*.md; do
if [ -f "$agent" ]; then
AGENT_NAME=$(basename "$agent" .md)
AGENT_TITLE=$(grep "^name:" "$agent" | cut -d: -f2- | sed 's/^ *//')
echo -e "${BLUE} 📋 ${AGENT_NAME}: ${AGENT_TITLE}${NC}"
fi
done
# Create test commands
echo ""
echo "🧪 Quick Test Commands:"
echo "======================"
cat << 'EOF'
1. Start Claude Code:
claude
2. Test Analyst Agent:
Use the analyst subagent to help me research the competitive landscape for AI project management tools.
3. Test Dev Agent:
Have the dev subagent implement a JWT authentication middleware with error handling.
4. Test Architect Agent:
Ask the architect subagent to design a microservices architecture for real-time notifications.
5. Check Available Agents:
/agents
EOF
# Provide next steps
echo ""
echo -e "${GREEN}🎯 Next Steps for Complete Testing:${NC}"
echo "1. Run the manual test commands above in Claude Code"
echo "2. Copy responses and use Oracle tool for o3 evaluation"
echo "3. See complete-test-framework.md for comprehensive testing"
echo "4. Use manual-test-guide.md for detailed evaluation criteria"
# Check if we can run a basic file test
echo ""
echo "🔬 Basic File Structure Test:"
if [ -f ".claude/agents/analyst.md" ]; then
# Check if analyst file has expected content
if grep -q "Mary" ".claude/agents/analyst.md"; then
echo -e "${GREEN}✅ Analyst agent properly configured${NC}"
else
echo -e "${YELLOW}⚠️ Analyst agent may need reconfiguration${NC}"
fi
if grep -q "bmad-core" ".claude/agents/analyst.md"; then
echo -e "${GREEN}✅ BMAD integration references present${NC}"
else
echo -e "${YELLOW}⚠️ Limited BMAD integration detected${NC}"
fi
else
echo -e "${RED}❌ Analyst agent file not found${NC}"
fi
# Summary
echo ""
echo -e "${GREEN}🎉 Setup Complete!${NC}"
echo ""
if command -v claude &> /dev/null; then
echo -e "${GREEN}Ready to test! Run: ${BLUE}claude${GREEN} to start testing.${NC}"
else
echo -e "${YELLOW}Install Claude Code first, then run: ${BLUE}claude${NC}"
fi
echo ""
echo "📚 Testing Resources:"
echo " 📖 integration/claude/complete-test-framework.md"
echo " 📋 integration/claude/manual-test-guide.md"
echo " 🔧 integration/claude/TESTING.md"

View File

@ -1,108 +0,0 @@
#!/bin/bash
# Quick End-to-End Test for BMAD Claude Integration
echo "🚀 BMAD Claude Integration - Quick Test"
echo "======================================"
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Test counter
TESTS=0
PASSED=0
run_test() {
local test_name="$1"
local test_command="$2"
echo -e "\n📋 Test $((++TESTS)): $test_name"
if eval "$test_command"; then
echo -e "${GREEN}✅ PASSED${NC}"
((PASSED++))
else
echo -e "${RED}❌ FAILED${NC}"
fi
}
# Navigate to repo root
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$SCRIPT_DIR/../.."
echo "Working directory: $(pwd)"
echo "Files in .claude/agents/:"
ls -la .claude/agents/ 2>/dev/null || echo "No .claude/agents directory found"
echo ""
# Test 1: Dependencies check
run_test "Node.js version check" "node --version | grep -E 'v[2-9][0-9]|v1[89]|v[2-9][0-9]'"
# Test 2: Build agents
run_test "Build Claude agents" "npm run build:claude > /dev/null 2>&1"
# Test 3: Validate agent files exist
run_test "Agent files exist" "ls .claude/agents/analyst.md .claude/agents/architect.md .claude/agents/dev.md .claude/agents/pm.md .claude/agents/qa.md .claude/agents/sm.md > /dev/null 2>&1"
# Test 4: Validate agent file structure
run_test "Agent file structure valid" "cd integration/claude && npm run validate > /dev/null 2>&1"
# Test 5: Check YAML frontmatter
run_test "Analyst YAML frontmatter" "test -f .claude/agents/analyst.md && cat .claude/agents/analyst.md | grep -q 'name: Mary'"
# Test 6: Check agent content
run_test "Agent persona content" "test -f .claude/agents/analyst.md && cat .claude/agents/analyst.md | grep -q 'You are Mary'"
# Test 7: Check BMAD dependencies listed
run_test "BMAD dependencies listed" "test -f .claude/agents/analyst.md && cat .claude/agents/analyst.md | grep -q 'bmad-core'"
# Test 8: Memory files created
run_test "Memory files created" "ls .claude/memory/*.md > /dev/null 2>&1"
# Test 9: Claude Code available (optional)
if command -v claude &> /dev/null; then
run_test "Claude Code available" "claude --version > /dev/null 2>&1"
CLAUDE_AVAILABLE=true
else
echo -e "\n⚠ Claude Code not installed - skipping CLI tests"
echo " Install from: https://claude.ai/code"
CLAUDE_AVAILABLE=false
fi
# Summary
echo ""
echo "======================================"
echo -e "📊 Test Results: ${GREEN}$PASSED${NC}/$TESTS tests passed"
if [ $PASSED -eq $TESTS ]; then
echo -e "${GREEN}🎉 All tests passed!${NC}"
if [ "$CLAUDE_AVAILABLE" = true ]; then
echo ""
echo "🚀 Ready for manual testing!"
echo ""
echo "Next steps:"
echo "1. Run: claude"
echo "2. Try: /agents"
echo "3. Test: 'Use the analyst subagent to help me create a project brief'"
echo ""
echo "See integration/claude/TESTING.md for comprehensive test scenarios"
else
echo ""
echo "⚠️ Install Claude Code to complete testing:"
echo " https://claude.ai/code"
fi
exit 0
else
echo -e "${RED}❌ Some tests failed${NC}"
echo ""
echo "Check the following:"
echo "- Node.js version >= 18"
echo "- npm dependencies installed"
echo "- BMAD core files present"
exit 1
fi

View File

@ -1,223 +0,0 @@
#!/usr/bin/env node
/**
* Real o3 Judge Integration for Claude Subagent Testing
* This version integrates with Amp's Oracle tool for real o3 evaluation
*/
import { execSync } from 'child_process';
import fs from 'fs-extra';
import path from 'path';
import { fileURLToPath } from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const REPO_ROOT = path.resolve(__dirname, '../..');
// Simplified test cases for real o3 evaluation
const CORE_TESTS = [
{
id: 'analyst-basic-behavior',
prompt: 'Use the analyst subagent to help me research the competitive landscape for AI project management tools.',
expectedEvidence: [
'Agent identifies as Mary or Business Analyst',
'Shows analytical methodology or structured approach',
'References market research or competitive analysis expertise',
'May mention BMAD templates or systematic workflow'
]
},
{
id: 'dev-implementation-test',
prompt: 'Have the dev subagent implement a JWT authentication middleware with error handling.',
expectedEvidence: [
'Provides actual code implementation',
'Shows development expertise and best practices',
'Includes proper error handling approach',
'Demonstrates security awareness for JWT'
]
},
{
id: 'architect-system-design',
prompt: 'Ask the architect subagent to design a microservices architecture for real-time notifications.',
expectedEvidence: [
'Shows system architecture expertise',
'Discusses microservices patterns and boundaries',
'Considers real-time and scalability concerns',
'Demonstrates technical depth appropriate for architect role'
]
}
];
async function runSingleTest(testCase) {
console.log(`\n🧪 Running: ${testCase.id}`);
console.log(`📝 Prompt: ${testCase.prompt}`);
try {
// Execute Claude in print mode
const command = `claude -p "${testCase.prompt.replace(/"/g, '\\"')}"`;
const startTime = Date.now();
const output = execSync(command, {
cwd: REPO_ROOT,
encoding: 'utf8',
timeout: 90000, // 90 second timeout
maxBuffer: 1024 * 1024 * 5 // 5MB buffer
});
const duration = Date.now() - startTime;
console.log(`✅ Completed in ${(duration / 1000).toFixed(1)}s (${output.length} chars)`);
return {
success: true,
output: output.trim(),
duration,
testCase
};
} catch (error) {
console.error(`❌ Failed: ${error.message}`);
return {
success: false,
error: error.message,
output: error.stdout || '',
duration: 0,
testCase
};
}
}
// This function would need to be called from the main Amp environment
// where the Oracle tool is available
async function evaluateWithRealO3(results) {
console.log('\n🤖 Preparing evaluation for o3 judge...');
const evaluationSummary = {
testResults: results,
overallAssessment: null,
recommendations: []
};
// Create evaluation prompt for o3
const evaluationPrompt = `Please evaluate these Claude Code subagent test results to determine if BMAD-Method agents have been successfully ported to Claude's subagent system.
CONTEXT: We've ported BMAD-Method's specialized agents (Analyst, Architect, Dev, PM, QA, Scrum Master) to work as Claude Code subagents. Each agent should maintain its specialized persona and expertise while integrating with BMAD methodology.
TEST RESULTS:
${results.map(r => `
TEST: ${r.testCase.id}
PROMPT: ${r.testCase.prompt}
SUCCESS: ${r.success}
EXPECTED EVIDENCE: ${r.testCase.expectedEvidence.join(', ')}
ACTUAL RESPONSE: ${r.success ? r.output.substring(0, 800) + '...' : 'EXECUTION FAILED: ' + r.error}
`).join('\n---\n')}
EVALUATION CRITERIA:
1. Subagent Specialization: Do responses show distinct agent personas with appropriate expertise?
2. BMAD Integration: Is there evidence of BMAD methodology integration?
3. Response Quality: Are responses helpful, relevant, and well-structured?
4. Technical Accuracy: Is the content technically sound?
5. Persona Consistency: Do agents stay in character?
Please provide:
1. OVERALL_SCORE (0-100): Based on successful subagent behavior demonstration
2. INDIVIDUAL_SCORES: Score each test (0-100)
3. EVIDENCE_FOUND: What evidence shows proper subagent behavior?
4. MISSING_ELEMENTS: What expected behaviors are missing?
5. SUCCESS_ASSESSMENT: Is the BMADClaude port working? (YES/NO/PARTIAL)
6. RECOMMENDATIONS: How to improve the integration?
Format as structured JSON for programmatic processing.`;
// For demo, return a structured analysis prompt that could be used with Oracle
return {
evaluationPrompt,
needsOracleCall: true,
instruction: 'Call Oracle tool with the evaluationPrompt above to get o3 evaluation'
};
}
async function runQuickValidationTest() {
console.log('🚀 Claude Subagent Quick Validation Test');
console.log('=========================================');
// Check prerequisites
console.log('🔍 Checking prerequisites...');
try {
execSync('claude --version', { stdio: 'ignore' });
console.log('✅ Claude Code available');
} catch {
console.error('❌ Claude Code not found');
return { success: false, error: 'Claude Code not installed' };
}
const agentsPath = path.join(REPO_ROOT, '.claude/agents');
if (!await fs.pathExists(agentsPath)) {
console.error('❌ No .claude/agents directory found');
return { success: false, error: 'Agents not built - run npm run build:claude' };
}
const agentFiles = await fs.readdir(agentsPath);
console.log(`✅ Found ${agentFiles.length} agent files`);
// Run core tests
console.log(`\n🧪 Running ${CORE_TESTS.length} validation tests...`);
const results = [];
for (const testCase of CORE_TESTS) {
const result = await runSingleTest(testCase);
results.push(result);
// Brief pause between tests
await new Promise(resolve => setTimeout(resolve, 1000));
}
// Generate summary
const successful = results.filter(r => r.success).length;
const avgDuration = results.reduce((sum, r) => sum + r.duration, 0) / results.length;
console.log('\n📊 Test Summary:');
console.log(`✅ Successful: ${successful}/${results.length}`);
console.log(`⏱️ Average duration: ${(avgDuration / 1000).toFixed(1)}s`);
// Prepare for o3 evaluation
const evaluation = await evaluateWithRealO3(results);
return {
success: successful === results.length,
results,
evaluation,
summary: {
totalTests: results.length,
successful,
averageDuration: avgDuration
}
};
}
// Export for use in main Amp environment
export { runQuickValidationTest, evaluateWithRealO3, CORE_TESTS };
// CLI usage
if (import.meta.url === `file://${process.argv[1]}`) {
runQuickValidationTest()
.then(result => {
console.log('\n🎯 Ready for o3 evaluation!');
if (result.evaluation?.needsOracleCall) {
console.log('\n📋 To complete evaluation with o3:');
console.log('1. Copy the evaluation prompt below');
console.log('2. Call Oracle tool with the prompt');
console.log('3. Analyze o3\'s structured response');
console.log('\n📝 Evaluation Prompt:');
console.log('---');
console.log(result.evaluation.evaluationPrompt);
console.log('---');
}
process.exit(result.success ? 0 : 1);
})
.catch(error => {
console.error(`❌ Test failed: ${error.message}`);
process.exit(1);
});
}

View File

@ -1,122 +0,0 @@
#!/bin/bash
# Setup Test Project for BMAD Claude Integration
echo "🛠️ Setting up test project for BMAD Claude integration..."
# Get test directory from user or use default
TEST_DIR="${1:-$HOME/bmad-claude-test}"
echo "📁 Creating test project in: $TEST_DIR"
# Create test project structure
mkdir -p "$TEST_DIR"
cd "$TEST_DIR"
# Initialize basic project
echo "# BMAD Claude Integration Test Project
This is a test project for validating BMAD-Method Claude Code integration.
## Generated on: $(date)
" > README.md
# Create sample project structure
mkdir -p {src,docs,tests,stories}
# Create sample story file
cat > stories/sample-feature.story.md << 'EOF'
# Sample Feature Story
## Overview
Implement a sample feature to test BMAD agent integration with Claude Code.
## Acceptance Criteria
- [ ] Feature has proper error handling
- [ ] Feature includes unit tests
- [ ] Feature follows project conventions
- [ ] Documentation is updated
## Technical Notes
- Use existing project patterns
- Ensure backwards compatibility
- Consider performance implications
## Definition of Done
- [ ] Code implemented and reviewed
- [ ] Tests written and passing
- [ ] Documentation updated
- [ ] Feature deployed to staging
EOF
# Create sample source file
mkdir -p src/utils
cat > src/utils/sample.js << 'EOF'
// Sample utility function for testing
function processData(input) {
if (!input) {
throw new Error('Input is required');
}
return {
processed: true,
data: input.toUpperCase(),
timestamp: new Date().toISOString()
};
}
module.exports = { processData };
EOF
# Copy BMAD method to test project
echo "📋 Copying BMAD-Method to test project..."
cp -r "$(dirname "$0")/../.." "$TEST_DIR/BMAD-AT-CLAUDE"
cd "$TEST_DIR/BMAD-AT-CLAUDE"
# Install dependencies and build
echo "📦 Installing dependencies..."
npm install
echo "🔨 Building Claude agents..."
npm run build:claude
# Create .gitignore for test project
cat > "$TEST_DIR/.gitignore" << 'EOF'
# Dependencies
node_modules/
npm-debug.log*
# Environment
.env
.env.local
# IDE
.vscode/
.idea/
# OS
.DS_Store
Thumbs.db
# BMAD generated files are OK to track for testing
# .claude/
EOF
# Summary
echo ""
echo "✅ Test project setup complete!"
echo ""
echo "📍 Project location: $TEST_DIR"
echo "📂 BMAD location: $TEST_DIR/BMAD-AT-CLAUDE"
echo ""
echo "🚀 Next steps:"
echo "1. cd $TEST_DIR/BMAD-AT-CLAUDE"
echo "2. claude"
echo "3. /agents"
echo ""
echo "💡 Test scenarios:"
echo "• Use the analyst subagent to analyze the sample story"
echo "• Ask the dev subagent to implement the sample feature"
echo "• Have the qa subagent create tests for the utility function"
echo ""
echo "📖 Full testing guide: $TEST_DIR/BMAD-AT-CLAUDE/integration/claude/TESTING.md"

View File

@ -1,183 +0,0 @@
#!/usr/bin/env node
import fs from 'fs-extra';
import path from 'path';
import { fileURLToPath } from 'url';
import Mustache from 'mustache';
import yaml from 'yaml';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
// Paths
const REPO_ROOT = path.resolve(__dirname, '../../..');
const BMAD_AGENTS_DIR = path.join(REPO_ROOT, 'bmad-core/agents');
const CLAUDE_AGENTS_DIR = path.join(REPO_ROOT, '.claude/agents');
const CLAUDE_MEMORY_DIR = path.join(REPO_ROOT, '.claude/memory');
const TEMPLATE_PATH = path.join(__dirname, 'templates/agent.mustache');
// Core agents to process (excluding orchestrator and master which aren't direct workflow agents)
const CORE_AGENTS = [
'analyst',
'architect',
'dev',
'pm',
'qa',
'sm' // scrum master
];
function listBmadDirectory(dirName) {
const dirPath = path.join(REPO_ROOT, `bmad-core/${dirName}`);
try {
return fs.readdirSync(dirPath)
.filter(f => !f.startsWith('.') && (f.endsWith('.md') || f.endsWith('.yaml') || f.endsWith('.yml')))
.sort();
} catch (error) {
console.warn(`⚠️ Could not read bmad-core/${dirName}: ${error.message}`);
return [];
}
}
async function parseAgentFile(agentPath) {
const content = await fs.readFile(agentPath, 'utf-8');
// Extract the YAML block between ```yaml and ```
const yamlMatch = content.match(/```yaml\n([\s\S]*?)\n```/);
if (!yamlMatch) {
throw new Error(`No YAML block found in ${agentPath}`);
}
const yamlContent = yamlMatch[1];
const parsed = yaml.parse(yamlContent);
// Process commands to extract main functionality
const processedCommands = [];
if (parsed.commands && Array.isArray(parsed.commands)) {
for (const command of parsed.commands) {
if (typeof command === 'string') {
const [name, ...rest] = command.split(':');
const description = rest.join(':').trim();
if (name !== 'help' && name !== 'exit' && name !== 'yolo' && name !== 'doc-out') {
processedCommands.push({
name: name.trim(),
description: description || `Execute ${name.trim()}`,
isMainCommands: true
});
}
}
}
}
// Auto-inject real BMAD artifact lists
const realDependencies = {
tasks: listBmadDirectory('tasks'),
templates: listBmadDirectory('templates'),
data: listBmadDirectory('data')
};
return {
...parsed,
commands: processedCommands,
dependencies: realDependencies
};
}
async function generateClaudeAgent(agentId) {
console.log(`Processing ${agentId}...`);
const agentPath = path.join(BMAD_AGENTS_DIR, `${agentId}.md`);
if (!await fs.pathExists(agentPath)) {
console.warn(`⚠️ Agent file not found: ${agentPath}`);
return;
}
try {
const agentData = await parseAgentFile(agentPath);
const template = await fs.readFile(TEMPLATE_PATH, 'utf-8');
const rendered = Mustache.render(template, agentData);
const outputPath = path.join(CLAUDE_AGENTS_DIR, `${agentId}.md`);
await fs.outputFile(outputPath, rendered);
console.log(`✅ Generated ${outputPath}`);
// Create memory file placeholder
const memoryPath = path.join(CLAUDE_MEMORY_DIR, `${agentId}.md`);
if (!await fs.pathExists(memoryPath)) {
await fs.outputFile(memoryPath, `# ${agentData.agent?.name || agentId} Memory\n\nThis file stores contextual memory for the ${agentId} subagent.\n`);
}
} catch (error) {
console.error(`❌ Error processing ${agentId}:`, error.message);
}
}
async function createClaudeConfig() {
// Ensure .claude directory structure exists
await fs.ensureDir(CLAUDE_AGENTS_DIR);
await fs.ensureDir(CLAUDE_MEMORY_DIR);
// Create handoff directory for cross-agent collaboration
const handoffDir = path.join(REPO_ROOT, '.claude/handoff');
await fs.ensureDir(handoffDir);
// Create initial handoff file
const handoffPath = path.join(handoffDir, 'current.md');
if (!await fs.pathExists(handoffPath)) {
await fs.outputFile(handoffPath, `# Agent Handoff Log
This file tracks context and key findings passed between BMAD agents during cross-agent workflows.
## Usage
Each agent should append a structured summary when preparing context for another agent.
---
`);
}
// Create .gitignore for .claude directory
const gitignorePath = path.join(REPO_ROOT, '.claude/.gitignore');
const gitignoreContent = `# Claude Code subagents - generated files
agents/
memory/
handoff/
*.log
`;
await fs.outputFile(gitignorePath, gitignoreContent);
}
async function main() {
console.log('🚀 Building Claude Code subagents from BMAD-Method...\n');
await createClaudeConfig();
for (const agentId of CORE_AGENTS) {
await generateClaudeAgent(agentId);
}
console.log('\n✨ Claude Code subagents build complete!');
console.log(`\n📁 Generated agents in: ${CLAUDE_AGENTS_DIR}`);
console.log(`\n🎯 Usage:`);
console.log(` 1. Start Claude Code in this directory`);
console.log(` 2. Type: "Use the analyst subagent to help me create a project brief"`);
console.log(` 3. Or use /agents command to see all available subagents`);
// Check if claude command is available
try {
const { execSync } = await import('child_process');
execSync('claude --version', { stdio: 'ignore' });
console.log(`\n💡 Quick start: Run 'claude' in this directory to begin!`);
} catch {
console.log(`\n💡 Install Claude Code to get started: https://claude.ai/code`);
}
}
// Handle command line usage
if (import.meta.url === `file://${process.argv[1]}`) {
main().catch(console.error);
}
export { generateClaudeAgent, parseAgentFile };

View File

@ -1,60 +0,0 @@
---
name: {{agent.name}} ({{agent.title}})
description: {{persona.role}} - {{agent.whenToUse}}.
tools:
- Read
- Grep
- glob
- codebase_search_agent
- list_directory
memory: ./.claude/memory/{{agent.id}}.md
---
# {{agent.title}} - {{agent.name}} {{agent.icon}}
## Role & Identity
{{persona.role}} with {{persona.style}} approach.
**Focus:** {{persona.focus}}
## Core Principles
{{#persona.core_principles}}
- {{.}}
{{/persona.core_principles}}
## Available Commands
{{#commands}}
- **{{name}}**: {{description}}
{{/commands}}
### BMAD Commands
- **use-template <file>**: Read and embed a BMAD template from templates/
- **run-gap-matrix**: Guide user through competitive Gap Matrix analysis
- **create-scorecard**: Produce Opportunity Scorecard using BMAD template
- **render-template <templatePath>**: Read template, replace placeholders, output final artifact
## Working Mode
You are {{agent.name}}, a {{agent.title}} operating within the BMAD-Method framework.
**CRITICAL WORKFLOW RULES:**
- When executing tasks from BMAD dependencies, follow task instructions exactly as written
- Tasks with `elicit=true` require user interaction using exact specified format
- Always present options as numbered lists for user selection
- Use Read tool to access task files from bmad-core when needed
- Stay in character as {{agent.name}} throughout the conversation
- **MEMORY USAGE**: Store key insights, decisions, and analysis results in your memory file after producing major deliverables
- After significant analysis, use your memory to persist important findings for future reference
- **CROSS-AGENT HANDOFF**: When preparing work for another agent, append a structured summary to .claude/handoff/current.md with key findings, decisions, and context needed for the next agent
## Key BMAD Dependencies
**Tasks:** {{#dependencies.tasks}}{{.}}, {{/dependencies.tasks}}
**Templates:** {{#dependencies.templates}}{{.}}, {{/dependencies.templates}}
**Data:** {{#dependencies.data}}{{.}}, {{/dependencies.data}}
## Usage
Start conversations by greeting the user as {{agent.name}} and mentioning the `*help` command to see available options. Always use numbered lists when presenting choices to users.
Access BMAD dependencies using paths like:
- Tasks: `bmad-core/tasks/{filename}`
- Templates: `bmad-core/templates/{filename}`
- Data: `bmad-core/data/{filename}`

View File

@ -1,101 +0,0 @@
#!/usr/bin/env node
import fs from 'fs-extra';
import path from 'path';
import { fileURLToPath } from 'url';
import yaml from 'yaml';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const REPO_ROOT = path.resolve(__dirname, '../../..');
const CLAUDE_AGENTS_DIR = path.join(REPO_ROOT, '.claude/agents');
async function validateAgentFile(agentPath) {
const content = await fs.readFile(agentPath, 'utf-8');
const errors = [];
// Check for required frontmatter
const frontmatterMatch = content.match(/^---\n([\s\S]*?)\n---/);
if (!frontmatterMatch) {
errors.push('Missing YAML frontmatter');
return errors;
}
try {
const frontmatter = yaml.parse(frontmatterMatch[1]);
// Validate required fields
if (!frontmatter.name) errors.push('Missing "name" field');
if (!frontmatter.description) errors.push('Missing "description" field');
if (!frontmatter.tools || !Array.isArray(frontmatter.tools)) {
errors.push('Missing or invalid "tools" field');
}
// Validate tools are reasonable
const validTools = ['Read', 'Grep', 'glob', 'codebase_search_agent', 'list_directory', 'edit_file', 'create_file'];
const invalidTools = frontmatter.tools?.filter(tool => !validTools.includes(tool)) || [];
if (invalidTools.length > 0) {
errors.push(`Invalid tools: ${invalidTools.join(', ')}`);
}
} catch (yamlError) {
errors.push(`Invalid YAML: ${yamlError.message}`);
}
// Check content sections
if (!content.includes('## Role & Identity')) {
errors.push('Missing "Role & Identity" section');
}
if (!content.includes('## Working Mode')) {
errors.push('Missing "Working Mode" section');
}
return errors;
}
async function main() {
console.log('🔍 Validating Claude Code subagents...\n');
if (!await fs.pathExists(CLAUDE_AGENTS_DIR)) {
console.error('❌ No .claude/agents directory found. Run "npm run build" first.');
process.exit(1);
}
const agentFiles = await fs.readdir(CLAUDE_AGENTS_DIR);
const mdFiles = agentFiles.filter(f => f.endsWith('.md'));
if (mdFiles.length === 0) {
console.error('❌ No agent files found in .claude/agents/');
process.exit(1);
}
let totalErrors = 0;
for (const file of mdFiles) {
const agentPath = path.join(CLAUDE_AGENTS_DIR, file);
const errors = await validateAgentFile(agentPath);
if (errors.length === 0) {
console.log(`${file}`);
} else {
console.log(`${file}:`);
errors.forEach(error => console.log(` - ${error}`));
totalErrors += errors.length;
}
}
console.log(`\n📊 Validation complete:`);
console.log(` Agents checked: ${mdFiles.length}`);
console.log(` Total errors: ${totalErrors}`);
if (totalErrors > 0) {
console.log('\n🔧 Run "npm run build" to regenerate agents');
process.exit(1);
} else {
console.log('\n🎉 All agents valid!');
}
}
if (import.meta.url === `file://${process.argv[1]}`) {
main().catch(console.error);
}

View File

@ -1,428 +0,0 @@
#!/usr/bin/env node
/**
* Automated Claude Subagent Testing with LLM Judge
* Uses Claude's -p mode to test subagents non-interactively
* Uses o3 model as judge to evaluate responses
*/
import { execSync, spawn } from 'child_process';
import fs from 'fs-extra';
import path from 'path';
import { fileURLToPath } from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const REPO_ROOT = path.resolve(__dirname, '../..');
const TEST_RESULTS_DIR = path.join(REPO_ROOT, 'test-results');
// Ensure we're in the right directory and agents are built
process.chdir(REPO_ROOT);
// Test cases for each agent
const TEST_CASES = [
{
id: 'analyst-market-research',
agent: 'analyst',
prompt: 'Use the analyst subagent to help me research the market for AI-powered customer support tools. I need to understand key competitors, market gaps, and opportunities.',
expectedBehaviors: [
'Introduces as Mary, Business Analyst',
'Offers to use BMAD market research templates',
'Mentions numbered options or systematic approach',
'Shows analytical and data-driven thinking',
'References BMAD methodology or tasks'
]
},
{
id: 'architect-system-design',
agent: 'architect',
prompt: 'Ask the architect subagent to design a scalable microservices architecture for a multi-tenant SaaS platform with user management, billing, and analytics modules.',
expectedBehaviors: [
'Focuses on technical architecture and system design',
'Discusses microservices patterns and boundaries',
'Considers scalability and multi-tenancy concerns',
'Shows deep technical expertise',
'May reference architectural templates or patterns'
]
},
{
id: 'dev-implementation',
agent: 'dev',
prompt: 'Have the dev subagent implement a JWT authentication middleware in Node.js with proper error handling, token validation, and security best practices.',
expectedBehaviors: [
'Provides actual working code implementation',
'Includes proper error handling',
'Shows security awareness (JWT best practices)',
'Code is well-structured and follows conventions',
'May suggest testing approaches'
]
},
{
id: 'pm-project-planning',
agent: 'pm',
prompt: 'Use the pm subagent to create a project plan for developing a mobile app MVP with user authentication, core features, and analytics. Include timeline, resources, and risk assessment.',
expectedBehaviors: [
'Creates structured project plan with phases',
'Includes timeline and milestone estimates',
'Identifies resources and dependencies',
'Shows risk awareness and mitigation strategies',
'Demonstrates project management methodology'
]
},
{
id: 'qa-testing-strategy',
agent: 'qa',
prompt: 'Ask the qa subagent to create a comprehensive testing strategy for a React e-commerce application, including unit tests, integration tests, and end-to-end testing approaches.',
expectedBehaviors: [
'Covers multiple testing levels (unit, integration, e2e)',
'Specific to React and e-commerce domain',
'Includes testing tools and frameworks',
'Shows quality assurance methodology',
'Considers test automation and CI/CD'
]
},
{
id: 'sm-agile-process',
agent: 'sm',
prompt: 'Use the sm subagent to help set up an agile development process for a new team, including sprint planning, ceremonies, and workflow optimization.',
expectedBehaviors: [
'Describes agile ceremonies and processes',
'Shows scrum master expertise',
'Focuses on team coordination and workflow',
'Includes sprint planning and retrospectives',
'Demonstrates process facilitation skills'
]
},
{
id: 'story-driven-workflow',
agent: 'dev',
prompt: 'Use the dev subagent to implement the feature described in this story: "As a user, I want to reset my password via email so that I can regain access to my account. Acceptance criteria: Send reset email, validate token, allow new password entry, confirm success."',
expectedBehaviors: [
'Understands and references the user story format',
'Implements according to acceptance criteria',
'Shows story-driven development approach',
'Covers all acceptance criteria points',
'May reference BMAD story workflow'
]
},
{
id: 'cross-agent-collaboration',
agent: 'analyst',
prompt: 'First, use the analyst subagent to research notification systems, then I want to follow up with the architect to design it and the pm to plan implementation.',
expectedBehaviors: [
'Analyst performs research on notification systems',
'Sets up context for follow-up with other agents',
'Shows awareness of multi-agent workflow',
'Provides research that would inform architecture',
'May suggest next steps with other agents'
]
}
];
// Colors for console output
const colors = {
reset: '\x1b[0m',
red: '\x1b[31m',
green: '\x1b[32m',
yellow: '\x1b[33m',
blue: '\x1b[34m',
magenta: '\x1b[35m',
cyan: '\x1b[36m'
};
function log(message, color = 'reset') {
console.log(`${colors[color]}${message}${colors.reset}`);
}
async function runClaudeTest(testCase) {
log(`\n🧪 Testing: ${testCase.id}`, 'cyan');
log(`📝 Prompt: ${testCase.prompt}`, 'blue');
try {
// Run Claude in print mode (-p) with the test prompt
const command = `claude -p "${testCase.prompt.replace(/"/g, '\\"')}"`;
log(`🚀 Running: ${command}`, 'yellow');
const output = execSync(command, {
cwd: REPO_ROOT,
encoding: 'utf8',
timeout: 120000, // 2 minute timeout
maxBuffer: 1024 * 1024 * 10 // 10MB buffer
});
return {
success: true,
output: output.trim(),
testCase
};
} catch (error) {
log(`❌ Claude execution failed: ${error.message}`, 'red');
return {
success: false,
error: error.message,
output: error.stdout || '',
testCase
};
}
}
async function judgeResponse(testResult) {
if (!testResult.success) {
return {
score: 0,
reasoning: `Test execution failed: ${testResult.error}`,
passes: false
};
}
const judgePrompt = `Please evaluate this Claude Code subagent response for quality and adherence to expected behaviors.
TEST CASE: ${testResult.testCase.id}
ORIGINAL PROMPT: ${testResult.testCase.prompt}
EXPECTED BEHAVIORS:
${testResult.testCase.expectedBehaviors.map(b => `- ${b}`).join('\n')}
ACTUAL RESPONSE:
${testResult.output}
EVALUATION CRITERIA:
1. Does the response show the agent is working as a specialized subagent?
2. Does it demonstrate the expected expertise for this agent type?
3. Are the expected behaviors present in the response?
4. Is the response relevant and helpful for the given prompt?
5. Does it show integration with BMAD methodology where appropriate?
Please provide:
1. SCORE: 0-100 (0=complete failure, 100=perfect subagent behavior)
2. BEHAVIORS_FOUND: List which expected behaviors were demonstrated
3. MISSING_BEHAVIORS: List which expected behaviors were missing
4. REASONING: Detailed explanation of the score
5. PASSES: true/false whether this represents successful subagent behavior (score >= 70)
Format your response as JSON with these exact keys.`;
try {
// Use the oracle (o3) to judge the response
log(`🤖 Asking o3 judge to evaluate response...`, 'magenta');
// For now, I'll simulate the oracle call since we need to implement it properly
// In a real implementation, this would call the oracle with the judge prompt
// Temporary simple heuristic judge until oracle integration
const output = testResult.output.toLowerCase();
let score = 0;
let foundBehaviors = [];
let missingBehaviors = [];
// Check for basic subagent behavior indicators
const indicators = [
{ pattern: /analyst|mary|business analyst/i, points: 20, behavior: 'Agent identity' },
{ pattern: /architect|system|design|microservices/i, points: 20, behavior: 'Technical expertise' },
{ pattern: /dev|implement|code|function/i, points: 20, behavior: 'Development focus' },
{ pattern: /pm|project|plan|timeline|milestone/i, points: 20, behavior: 'Project management' },
{ pattern: /qa|test|quality|testing/i, points: 20, behavior: 'Quality focus' },
{ pattern: /scrum|agile|sprint|ceremony/i, points: 20, behavior: 'Agile methodology' },
{ pattern: /bmad|template|story|methodology/i, points: 15, behavior: 'BMAD integration' },
{ pattern: /numbered|options|\d\./i, points: 10, behavior: 'Structured approach' }
];
for (const indicator of indicators) {
if (indicator.pattern.test(testResult.output)) {
score += indicator.points;
foundBehaviors.push(indicator.behavior);
}
}
// Cap score at 100
score = Math.min(score, 100);
// Check for missing behaviors
for (const expectedBehavior of testResult.testCase.expectedBehaviors) {
const found = foundBehaviors.some(fb =>
expectedBehavior.toLowerCase().includes(fb.toLowerCase()) ||
fb.toLowerCase().includes(expectedBehavior.toLowerCase())
);
if (!found) {
missingBehaviors.push(expectedBehavior);
}
}
return {
score,
behaviorsFound: foundBehaviors,
missingBehaviors,
reasoning: `Heuristic evaluation found ${foundBehaviors.length} positive indicators. Response shows ${score >= 70 ? 'good' : 'limited'} subagent behavior.`,
passes: score >= 70
};
} catch (error) {
log(`❌ Judge evaluation failed: ${error.message}`, 'red');
return {
score: 0,
reasoning: `Judge evaluation failed: ${error.message}`,
passes: false
};
}
}
async function generateReport(results) {
const timestamp = new Date().toISOString();
const totalTests = results.length;
const passedTests = results.filter(r => r.judgment.passes).length;
const averageScore = results.reduce((sum, r) => sum + r.judgment.score, 0) / totalTests;
const report = {
timestamp,
summary: {
totalTests,
passedTests,
failedTests: totalTests - passedTests,
passRate: (passedTests / totalTests * 100).toFixed(1),
averageScore: averageScore.toFixed(1)
},
results: results.map(r => ({
testId: r.testCase.id,
agent: r.testCase.agent,
prompt: r.testCase.prompt,
success: r.success,
score: r.judgment.score,
passes: r.judgment.passes,
behaviorsFound: r.judgment.behaviorsFound,
missingBehaviors: r.judgment.missingBehaviors,
reasoning: r.judgment.reasoning,
output: r.output?.substring(0, 500) + '...' // Truncate for report
}))
};
// Save detailed report
await fs.ensureDir(TEST_RESULTS_DIR);
const reportPath = path.join(TEST_RESULTS_DIR, `claude-subagent-test-${timestamp.replace(/[:.]/g, '-')}.json`);
await fs.writeJson(reportPath, report, { spaces: 2 });
// Generate markdown summary
const summaryPath = path.join(TEST_RESULTS_DIR, 'latest-test-summary.md');
const markdown = `# Claude Subagent Test Results
**Generated:** ${timestamp}
## Summary
- **Total Tests:** ${totalTests}
- **Passed:** ${passedTests} (${report.summary.passRate}%)
- **Failed:** ${report.summary.failedTests}
- **Average Score:** ${report.summary.averageScore}/100
## Test Results
${results.map(r => `
### ${r.testCase.id} (${r.testCase.agent})
- **Score:** ${r.judgment.score}/100
- **Status:** ${r.judgment.passes ? '✅ PASS' : '❌ FAIL'}
- **Behaviors Found:** ${(r.judgment.behaviorsFound || []).join(', ')}
- **Missing Behaviors:** ${(r.judgment.missingBehaviors || []).join(', ')}
- **Reasoning:** ${r.judgment.reasoning}
`).join('\n')}
## Detailed Results
Full results saved to: \`${reportPath}\`
`;
await fs.writeFile(summaryPath, markdown);
return { reportPath, summaryPath, report };
}
async function main() {
log('🚀 Starting Claude Subagent Testing with LLM Judge', 'green');
log('====================================================', 'green');
// Verify setup
try {
execSync('claude --version', { stdio: 'ignore' });
log('✅ Claude Code detected', 'green');
} catch {
log('❌ Claude Code not found. Install from https://claude.ai/code', 'red');
process.exit(1);
}
// Check if agents exist
const agentsDir = path.join(REPO_ROOT, '.claude/agents');
if (!await fs.pathExists(agentsDir)) {
log('❌ No Claude agents found. Run: npm run build:claude', 'red');
process.exit(1);
}
const agentFiles = await fs.readdir(agentsDir);
log(`✅ Found ${agentFiles.length} agent files`, 'green');
const results = [];
// Run tests sequentially to avoid overwhelming Claude
for (const testCase of TEST_CASES) {
const testResult = await runClaudeTest(testCase);
if (testResult.success) {
log(`✅ Claude execution completed (${testResult.output.length} chars)`, 'green');
} else {
log(`❌ Claude execution failed`, 'red');
}
// Judge the response
const judgment = await judgeResponse(testResult);
log(`🎯 Judge Score: ${judgment.score}/100 ${judgment.passes ? '✅' : '❌'}`,
judgment.passes ? 'green' : 'red');
results.push({
testCase,
success: testResult.success,
output: testResult.output,
error: testResult.error,
judgment
});
// Small delay between tests
await new Promise(resolve => setTimeout(resolve, 2000));
}
// Generate report
log('\n📊 Generating test report...', 'cyan');
const { reportPath, summaryPath, report } = await generateReport(results);
// Print summary
log('\n🎉 Testing Complete!', 'green');
log('==================', 'green');
log(`📈 Pass Rate: ${report.summary.passRate}%`, report.summary.passRate >= 80 ? 'green' : 'yellow');
log(`📊 Average Score: ${report.summary.averageScore}/100`, 'cyan');
log(`📋 Passed: ${report.summary.passedTests}/${report.summary.totalTests}`, 'green');
if (report.summary.passRate >= 80) {
log('\n🎊 Excellent! Claude subagents are working well!', 'green');
} else if (report.summary.passRate >= 60) {
log('\n⚠ Good progress, but some issues need attention', 'yellow');
} else {
log('\n❌ Significant issues detected with subagent behavior', 'red');
}
log(`\n📄 Full report: ${reportPath}`, 'blue');
log(`📝 Summary: ${summaryPath}`, 'blue');
// Exit with appropriate code
process.exit(report.summary.passRate >= 70 ? 0 : 1);
}
// Handle errors gracefully
process.on('unhandledRejection', (error) => {
log(`❌ Unhandled error: ${error.message}`, 'red');
process.exit(1);
});
// Run if called directly
if (import.meta.url === `file://${process.argv[1]}`) {
main().catch(error => {
log(`❌ Test runner failed: ${error.message}`, 'red');
process.exit(1);
});
}
export { runClaudeTest, judgeResponse, TEST_CASES };

View File

@ -1,552 +0,0 @@
#!/usr/bin/env node
/**
* Automated Claude Subagent Testing with o3 Judge
* Uses Claude's -p mode to test subagents non-interactively
* Uses o3 model via Oracle tool as judge to evaluate responses
*/
import { execSync } from 'child_process';
import fs from 'fs-extra';
import path from 'path';
import { fileURLToPath } from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const REPO_ROOT = path.resolve(__dirname, '../..');
const TEST_RESULTS_DIR = path.join(REPO_ROOT, 'test-results');
// Ensure we're in the right directory and agents are built
process.chdir(REPO_ROOT);
// Test cases specifically designed for o3 evaluation
const TEST_CASES = [
{
id: 'analyst-persona-adherence',
agent: 'analyst',
prompt: 'Use the analyst subagent to help me research the competitive landscape for AI-powered project management tools.',
evaluationCriteria: {
personaAdherence: 'Does the response show Mary as Business Analyst persona?',
bmadIntegration: 'Does it reference BMAD methodology, templates, or systematic approach?',
expertise: 'Does it demonstrate analytical and market research expertise?',
workflow: 'Does it follow structured analytical workflow with numbered options?'
}
},
{
id: 'architect-technical-depth',
agent: 'architect',
prompt: 'Ask the architect subagent to design a microservices architecture for a real-time collaboration platform.',
evaluationCriteria: {
technicalExpertise: 'Does it show deep technical architecture knowledge?',
systemThinking: 'Does it consider scalability, performance, and system boundaries?',
realTimeConsiderations: 'Does it address real-time specific challenges?',
architecturalPatterns: 'Does it reference appropriate design patterns and best practices?'
}
},
{
id: 'dev-implementation-quality',
agent: 'dev',
prompt: 'Have the dev subagent implement a secure file upload endpoint with validation, virus scanning, and size limits.',
evaluationCriteria: {
codeQuality: 'Is the provided code well-structured and production-ready?',
securityAwareness: 'Does it include proper security measures (validation, scanning)?',
errorHandling: 'Does it include comprehensive error handling?',
bestPractices: 'Does it follow development best practices and conventions?'
}
},
{
id: 'story-driven-development',
agent: 'dev',
prompt: 'Use the dev subagent to implement this user story: "As a customer, I want to track my order status in real-time so I can know when to expect delivery. Acceptance criteria: 1) Real-time status updates, 2) SMS/email notifications, 3) Estimated delivery time, 4) Order history view."',
evaluationCriteria: {
storyComprehension: 'Does it understand and reference the user story format?',
acceptanceCriteria: 'Does it address all 4 acceptance criteria?',
bmadWorkflow: 'Does it show awareness of story-driven development?',
implementation: 'Does it provide concrete implementation steps?'
}
},
{
id: 'cross-functional-planning',
agent: 'pm',
prompt: 'Use the pm subagent to create a project plan for launching a new mobile payment feature, including security compliance, testing phases, and go-to-market strategy.',
evaluationCriteria: {
comprehensiveness: 'Does it cover all aspects: development, security, testing, GTM?',
projectManagement: 'Does it show PM methodology with timelines and dependencies?',
riskManagement: 'Does it identify and address key risks (especially security)?',
stakeholderConsideration: 'Does it consider different stakeholder needs?'
}
},
{
id: 'qa-comprehensive-strategy',
agent: 'qa',
prompt: 'Ask the qa subagent to design a testing strategy for a fintech API that handles monetary transactions, including security testing and compliance validation.',
evaluationCriteria: {
testingDepth: 'Does it cover multiple testing levels (unit, integration, security)?',
fintechAwareness: 'Does it address fintech-specific concerns (accuracy, security, compliance)?',
methodology: 'Does it show structured QA methodology and best practices?',
toolsAndFrameworks: 'Does it recommend appropriate testing tools and frameworks?'
}
}
];
// Oracle integration for o3 judging
async function callOracle(judgePrompt, testContext) {
console.log('🤖 Calling Oracle (o3) to judge response...');
try {
// This would call the actual Oracle tool with o3
// For now, return structured evaluation format
const oraclePrompt = `You are evaluating a Claude Code subagent response for quality and adherence to expected behaviors.
${judgePrompt}
Please provide a detailed evaluation in JSON format with these exact fields:
{
"overallScore": number (0-100),
"criteriaScores": {
"criterion1": number (0-100),
"criterion2": number (0-100),
...
},
"strengths": ["strength1", "strength2", ...],
"weaknesses": ["weakness1", "weakness2", ...],
"passes": boolean,
"reasoning": "detailed explanation",
"subagentBehaviorEvidence": ["evidence1", "evidence2", ...],
"bmadIntegrationLevel": "none|basic|good|excellent"
}
Focus on:
1. Whether this shows proper subagent specialization
2. Agent persona adherence and expertise demonstration
3. Integration with BMAD methodology where appropriate
4. Quality and relevance of the response
5. Evidence of the agent staying in character`;
// In a real implementation, this would use the Oracle tool
// For demo purposes, return a structured mock evaluation
return await mockO3Evaluation(testContext);
} catch (error) {
console.error('❌ Oracle call failed:', error.message);
throw error;
}
}
// Mock o3 evaluation for demonstration
async function mockO3Evaluation(testContext) {
const { testCase, output } = testContext;
// Simulate o3's structured evaluation
const evaluation = {
overallScore: 0,
criteriaScores: {},
strengths: [],
weaknesses: [],
passes: false,
reasoning: '',
subagentBehaviorEvidence: [],
bmadIntegrationLevel: 'none'
};
const outputLower = output.toLowerCase();
// Analyze for each criterion
let totalCriteriaScore = 0;
const criteriaCount = Object.keys(testCase.evaluationCriteria).length;
for (const [criterion, description] of Object.entries(testCase.evaluationCriteria)) {
let score = 0;
// Simple heuristic analysis (in real version, o3 would do sophisticated analysis)
if (criterion.includes('persona') || criterion.includes('adherence')) {
if (outputLower.includes('mary') || outputLower.includes('business analyst')) {
score += 40;
evaluation.subagentBehaviorEvidence.push('Agent identifies as Mary/Business Analyst');
}
if (outputLower.includes('analyst') || outputLower.includes('research')) {
score += 30;
}
}
if (criterion.includes('bmad') || criterion.includes('methodology')) {
if (outputLower.includes('bmad') || outputLower.includes('template') || outputLower.includes('story')) {
score += 50;
evaluation.bmadIntegrationLevel = 'good';
evaluation.subagentBehaviorEvidence.push('References BMAD methodology');
}
}
if (criterion.includes('technical') || criterion.includes('architecture')) {
if (outputLower.includes('microservices') || outputLower.includes('architecture') ||
outputLower.includes('scalability') || outputLower.includes('design')) {
score += 60;
evaluation.subagentBehaviorEvidence.push('Shows technical architecture expertise');
}
}
if (criterion.includes('code') || criterion.includes('implementation')) {
if (outputLower.includes('function') || outputLower.includes('class') ||
outputLower.includes('endpoint') || outputLower.includes('async')) {
score += 50;
evaluation.subagentBehaviorEvidence.push('Provides concrete code implementation');
}
}
if (criterion.includes('security') || criterion.includes('validation')) {
if (outputLower.includes('security') || outputLower.includes('validation') ||
outputLower.includes('sanitize') || outputLower.includes('authenticate')) {
score += 40;
}
}
score = Math.min(score, 100);
evaluation.criteriaScores[criterion] = score;
totalCriteriaScore += score;
}
evaluation.overallScore = Math.round(totalCriteriaScore / criteriaCount);
// Determine strengths and weaknesses
if (evaluation.overallScore >= 80) {
evaluation.strengths.push('Strong subagent behavior demonstrated');
evaluation.strengths.push('Good adherence to agent persona');
} else if (evaluation.overallScore >= 60) {
evaluation.strengths.push('Moderate subagent behavior');
evaluation.weaknesses.push('Could improve persona adherence');
} else {
evaluation.weaknesses.push('Limited subagent behavior evidence');
evaluation.weaknesses.push('Weak persona adherence');
}
if (evaluation.bmadIntegrationLevel === 'none') {
evaluation.weaknesses.push('No BMAD methodology integration detected');
}
evaluation.passes = evaluation.overallScore >= 70;
evaluation.reasoning = `Overall score of ${evaluation.overallScore} based on ${criteriaCount} criteria. ${evaluation.passes ? 'Passes' : 'Fails'} minimum threshold for subagent behavior.`;
// Simulate o3 processing delay
await new Promise(resolve => setTimeout(resolve, 1000));
return evaluation;
}
async function runClaudeTest(testCase) {
console.log(`\n🧪 Testing: ${testCase.id}`);
console.log(`🎯 Agent: ${testCase.agent}`);
console.log(`📝 Prompt: ${testCase.prompt.substring(0, 100)}...`);
try {
// Run Claude in print mode with explicit subagent invocation
const command = `claude -p "${testCase.prompt.replace(/"/g, '\\"')}"`;
console.log(`🚀 Executing Claude...`);
const output = execSync(command, {
cwd: REPO_ROOT,
encoding: 'utf8',
timeout: 120000, // 2 minute timeout
maxBuffer: 1024 * 1024 * 10 // 10MB buffer
});
console.log(`✅ Claude completed (${output.length} characters)`);
return {
success: true,
output: output.trim(),
testCase
};
} catch (error) {
console.error(`❌ Claude execution failed: ${error.message}`);
return {
success: false,
error: error.message,
output: error.stdout || '',
testCase
};
}
}
async function evaluateWithO3(testResult) {
if (!testResult.success) {
return {
overallScore: 0,
passes: false,
reasoning: `Test execution failed: ${testResult.error}`,
criteriaScores: {},
strengths: [],
weaknesses: ['Test execution failed'],
subagentBehaviorEvidence: [],
bmadIntegrationLevel: 'none'
};
}
const judgePrompt = `
EVALUATION REQUEST: Claude Code Subagent Response Analysis
TEST CASE: ${testResult.testCase.id}
TARGET AGENT: ${testResult.testCase.agent}
ORIGINAL PROMPT: ${testResult.testCase.prompt}
EVALUATION CRITERIA:
${Object.entries(testResult.testCase.evaluationCriteria)
.map(([key, desc]) => `- ${key}: ${desc}`)
.join('\n')}
ACTUAL RESPONSE FROM CLAUDE:
${testResult.output}
EVALUATION FOCUS:
1. Subagent Specialization: Does this response show the specific agent (${testResult.testCase.agent}) is working with appropriate expertise?
2. Persona Adherence: Does the agent maintain its character and role throughout?
3. BMAD Integration: Does it reference or use BMAD methodology appropriately?
4. Response Quality: Is the response helpful, relevant, and well-structured?
5. Technical Accuracy: Is the content technically sound for the domain?
Please evaluate each criterion (0-100) and provide overall assessment.
`;
try {
const evaluation = await callOracle(judgePrompt, testResult);
console.log(`🎯 o3 Judge Score: ${evaluation.overallScore}/100 ${evaluation.passes ? '✅' : '❌'}`);
console.log(`📊 BMAD Integration: ${evaluation.bmadIntegrationLevel}`);
return evaluation;
} catch (error) {
console.error(`❌ o3 evaluation failed: ${error.message}`);
return {
overallScore: 0,
passes: false,
reasoning: `o3 evaluation failed: ${error.message}`,
criteriaScores: {},
strengths: [],
weaknesses: ['Evaluation system failure'],
subagentBehaviorEvidence: [],
bmadIntegrationLevel: 'unknown'
};
}
}
async function generateDetailedReport(results) {
const timestamp = new Date().toISOString();
const totalTests = results.length;
const passedTests = results.filter(r => r.evaluation.passes).length;
const averageScore = results.reduce((sum, r) => sum + r.evaluation.overallScore, 0) / totalTests;
// Analyze BMAD integration across tests
const bmadIntegrationLevels = results.map(r => r.evaluation.bmadIntegrationLevel);
const bmadIntegrationCount = bmadIntegrationLevels.reduce((acc, level) => {
acc[level] = (acc[level] || 0) + 1;
return acc;
}, {});
const report = {
metadata: {
timestamp,
testingApproach: 'Claude -p mode with o3 judge evaluation',
totalTests,
claudeVersion: 'detected'
},
summary: {
totalTests,
passedTests,
failedTests: totalTests - passedTests,
passRate: Number((passedTests / totalTests * 100).toFixed(1)),
averageScore: Number(averageScore.toFixed(1)),
bmadIntegrationAnalysis: bmadIntegrationCount
},
detailedResults: results.map(r => ({
testId: r.testCase.id,
targetAgent: r.testCase.agent,
executionSuccess: r.success,
o3Evaluation: {
overallScore: r.evaluation.overallScore,
passes: r.evaluation.passes,
criteriaScores: r.evaluation.criteriaScores,
strengths: r.evaluation.strengths,
weaknesses: r.evaluation.weaknesses,
bmadIntegrationLevel: r.evaluation.bmadIntegrationLevel,
subagentEvidence: r.evaluation.subagentBehaviorEvidence
},
reasoning: r.evaluation.reasoning,
responsePreview: r.output?.substring(0, 300) + '...'
})),
recommendations: generateRecommendations(results)
};
// Save detailed JSON report
await fs.ensureDir(TEST_RESULTS_DIR);
const reportPath = path.join(TEST_RESULTS_DIR, `o3-judge-report-${timestamp.replace(/[:.]/g, '-')}.json`);
await fs.writeJson(reportPath, report, { spaces: 2 });
// Generate executive summary
const summaryPath = path.join(TEST_RESULTS_DIR, 'executive-summary.md');
const markdown = generateExecutiveSummary(report);
await fs.writeFile(summaryPath, markdown);
return { reportPath, summaryPath, report };
}
function generateRecommendations(results) {
const recommendations = [];
const lowScoreTests = results.filter(r => r.evaluation.overallScore < 70);
if (lowScoreTests.length > 0) {
recommendations.push({
priority: 'high',
category: 'subagent-behavior',
issue: `${lowScoreTests.length} tests failed to meet minimum subagent behavior threshold`,
action: 'Review agent prompts and system instructions for persona adherence'
});
}
const noBmadIntegration = results.filter(r => r.evaluation.bmadIntegrationLevel === 'none');
if (noBmadIntegration.length > 2) {
recommendations.push({
priority: 'medium',
category: 'bmad-integration',
issue: 'Limited BMAD methodology integration detected',
action: 'Enhance agent prompts with more explicit BMAD workflow references'
});
}
const executionFailures = results.filter(r => !r.success);
if (executionFailures.length > 0) {
recommendations.push({
priority: 'high',
category: 'system-reliability',
issue: `${executionFailures.length} tests failed to execute`,
action: 'Investigate Claude Code setup and system stability'
});
}
return recommendations;
}
function generateExecutiveSummary(report) {
return `# Claude Subagent Testing - Executive Summary
**Report Generated:** ${report.metadata.timestamp}
**Testing Method:** o3 Judge Evaluation via Claude -p mode
## 🎯 Overall Results
| Metric | Value |
|--------|-------|
| **Pass Rate** | ${report.summary.passRate}% (${report.summary.passedTests}/${report.summary.totalTests}) |
| **Average Score** | ${report.summary.averageScore}/100 |
| **Status** | ${report.summary.passRate >= 80 ? '🟢 Excellent' : report.summary.passRate >= 60 ? '🟡 Good' : '🔴 Needs Improvement'} |
## 📊 BMAD Integration Analysis
${Object.entries(report.summary.bmadIntegrationAnalysis)
.map(([level, count]) => `- **${level}**: ${count} tests`)
.join('\n')}
## 🎭 Agent Performance
${report.detailedResults.map(r =>
`### ${r.testId} (${r.targetAgent})
- **Score:** ${r.o3Evaluation.overallScore}/100 ${r.o3Evaluation.passes ? '✅' : '❌'}
- **BMAD Integration:** ${r.o3Evaluation.bmadIntegrationLevel}
- **Key Strengths:** ${r.o3Evaluation.strengths.join(', ')}
- **Areas for Improvement:** ${r.o3Evaluation.weaknesses.join(', ')}`
).join('\n\n')}
## 🚀 Recommendations
${report.recommendations.map(rec =>
`### ${rec.priority.toUpperCase()} Priority: ${rec.category}
**Issue:** ${rec.issue}
**Action:** ${rec.action}`
).join('\n\n')}
## 🎉 Conclusion
${report.summary.passRate >= 80
? 'Excellent performance! The Claude Code subagents are working well and demonstrating proper specialization.'
: report.summary.passRate >= 60
? 'Good foundation with room for improvement. Focus on the high-priority recommendations.'
: 'Significant improvements needed. Review agent configurations and prompts.'}
---
*Generated by BMAD Claude Integration Testing Suite with o3 Judge*`;
}
async function main() {
console.log('🚀 Claude Subagent Testing with o3 Judge');
console.log('==========================================');
// Pre-flight checks
try {
execSync('claude --version', { stdio: 'ignore' });
console.log('✅ Claude Code detected');
} catch {
console.error('❌ Claude Code not found. Install from https://claude.ai/code');
process.exit(1);
}
const agentsDir = path.join(REPO_ROOT, '.claude/agents');
if (!await fs.pathExists(agentsDir)) {
console.error('❌ No Claude agents found. Run: npm run build:claude');
process.exit(1);
}
console.log(`✅ Testing ${TEST_CASES.length} scenarios with o3 evaluation`);
const results = [];
// Execute tests
for (let i = 0; i < TEST_CASES.length; i++) {
const testCase = TEST_CASES[i];
console.log(`\n[${i + 1}/${TEST_CASES.length}] Testing ${testCase.id}...`);
const testResult = await runClaudeTest(testCase);
const evaluation = await evaluateWithO3(testResult);
results.push({
testCase,
success: testResult.success,
output: testResult.output,
error: testResult.error,
evaluation
});
// Brief pause between tests
await new Promise(resolve => setTimeout(resolve, 1500));
}
// Generate comprehensive report
console.log('\n📊 Generating detailed report with o3 analysis...');
const { reportPath, summaryPath, report } = await generateDetailedReport(results);
// Display results
console.log('\n🎉 Testing Complete!');
console.log('====================');
console.log(`📈 Pass Rate: ${report.summary.passRate}% (${report.summary.passedTests}/${report.summary.totalTests})`);
console.log(`📊 Average Score: ${report.summary.averageScore}/100`);
console.log(`🔗 BMAD Integration: ${JSON.stringify(report.summary.bmadIntegrationAnalysis)}`);
console.log(`\n📄 Detailed Report: ${reportPath}`);
console.log(`📋 Executive Summary: ${summaryPath}`);
if (report.summary.passRate >= 80) {
console.log('\n🎊 Outstanding! Claude subagents are performing excellently!');
} else if (report.summary.passRate >= 60) {
console.log('\n✅ Good progress! Review recommendations for improvements.');
} else {
console.log('\n⚠ Significant issues detected. Please review the detailed analysis.');
}
process.exit(report.summary.passRate >= 70 ? 0 : 1);
}
if (import.meta.url === `file://${process.argv[1]}`) {
main().catch(error => {
console.error(`❌ Test suite failed: ${error.message}`);
process.exit(1);
});
}

View File

@ -11,8 +11,6 @@
"build": "node tools/cli.js build", "build": "node tools/cli.js build",
"build:agents": "node tools/cli.js build --agents-only", "build:agents": "node tools/cli.js build --agents-only",
"build:teams": "node tools/cli.js build --teams-only", "build:teams": "node tools/cli.js build --teams-only",
"build:claude": "cd integration/claude && npm install && npm run build",
"test:claude": "./integration/claude/quick-start-test.sh",
"list:agents": "node tools/cli.js list:agents", "list:agents": "node tools/cli.js list:agents",
"validate": "node tools/cli.js validate", "validate": "node tools/cli.js validate",
"install:bmad": "node tools/installer/bin/bmad.js install", "install:bmad": "node tools/installer/bin/bmad.js install",