# Manual Testing Guide with o3 Judge Since automated Claude testing can be complex due to session management, here's a comprehensive manual testing approach with o3 evaluation. ## Quick Manual Test Process ### 1. Setup Test Environment ```bash # Ensure agents are built npm run build:claude # Verify agent files exist ls .claude/agents/ # Start Claude Code claude ``` ### 2. Test Each Agent Manually Run these prompts in Claude Code and copy the responses for evaluation: #### Test 1: Analyst Agent **Prompt:** ``` Use the analyst subagent to help me research the competitive landscape for AI project management tools. ``` **Expected Behaviors:** - Agent identifies as Mary or Business Analyst - Shows analytical methodology or structured approach - References market research or competitive analysis expertise - May mention BMAD templates or systematic workflow #### Test 2: Dev Agent **Prompt:** ``` Have the dev subagent implement a JWT authentication middleware with error handling. ``` **Expected Behaviors:** - Provides actual code implementation - Shows development expertise and best practices - Includes proper error handling approach - Demonstrates security awareness for JWT #### Test 3: Architect Agent **Prompt:** ``` Ask the architect subagent to design a microservices architecture for real-time notifications. ``` **Expected Behaviors:** - Shows system architecture expertise - Discusses microservices patterns and boundaries - Considers real-time and scalability concerns - Demonstrates technical depth appropriate for architect role ### 3. Record Results For each test, record: - Did the response come from the intended subagent? - What evidence shows proper agent behavior? - How well does it integrate BMAD methodology? - Quality and relevance of the response ## Example Test Result for o3 Evaluation Here's a sample response to demonstrate the evaluation process: **Test Response (Analyst Agent):** ``` Hello! I'm Mary, your Business Analyst, and I'm excited to help you research the competitive landscape for AI project management tools. Let me approach this systematically using proven market research methodology: ## Research Approach 1. **Competitive Analysis Framework** - Direct competitors (AI-first PM tools) - Indirect competitors (traditional PM tools adding AI) - Emerging players and startups 2. **Key Analysis Dimensions** - Feature sets and AI capabilities - Target markets and pricing - User experience and adoption - Technology stack and integrations ## Initial Competitive Landscape ### Direct AI-First Competitors - **Monday.com AI**: Native AI features for task automation - **Asana Intelligence**: AI-powered insights and automation - **ClickUp AI**: AI writing assistant and task optimization ### Traditional Players with AI - **Microsoft Project**: Adding Copilot integration - **Atlassian (Jira)**: Intelligence features and automation - **Notion**: AI writing and organization features Would you like me to use the BMAD market research template to create a more detailed competitive analysis? I can help you: 1. Create a comprehensive competitor analysis document 2. Perform detailed feature mapping 3. Identify market gaps and opportunities 4. Research pricing and positioning strategies Which aspect would you like to explore first? ``` ## Using o3 Judge for Evaluation I'll now use the Oracle (o3) to evaluate this sample response: