116 lines
3.4 KiB
Markdown
116 lines
3.4 KiB
Markdown
# Manual Testing Guide with o3 Judge
|
|
|
|
Since automated Claude testing can be complex due to session management, here's a comprehensive manual testing approach with o3 evaluation.
|
|
|
|
## Quick Manual Test Process
|
|
|
|
### 1. Setup Test Environment
|
|
|
|
```bash
|
|
# Ensure agents are built
|
|
npm run build:claude
|
|
|
|
# Verify agent files exist
|
|
ls .claude/agents/
|
|
|
|
# Start Claude Code
|
|
claude
|
|
```
|
|
|
|
### 2. Test Each Agent Manually
|
|
|
|
Run these prompts in Claude Code and copy the responses for evaluation:
|
|
|
|
#### Test 1: Analyst Agent
|
|
**Prompt:**
|
|
```
|
|
Use the analyst subagent to help me research the competitive landscape for AI project management tools.
|
|
```
|
|
|
|
**Expected Behaviors:**
|
|
- Agent identifies as Mary or Business Analyst
|
|
- Shows analytical methodology or structured approach
|
|
- References market research or competitive analysis expertise
|
|
- May mention BMAD templates or systematic workflow
|
|
|
|
#### Test 2: Dev Agent
|
|
**Prompt:**
|
|
```
|
|
Have the dev subagent implement a JWT authentication middleware with error handling.
|
|
```
|
|
|
|
**Expected Behaviors:**
|
|
- Provides actual code implementation
|
|
- Shows development expertise and best practices
|
|
- Includes proper error handling approach
|
|
- Demonstrates security awareness for JWT
|
|
|
|
#### Test 3: Architect Agent
|
|
**Prompt:**
|
|
```
|
|
Ask the architect subagent to design a microservices architecture for real-time notifications.
|
|
```
|
|
|
|
**Expected Behaviors:**
|
|
- Shows system architecture expertise
|
|
- Discusses microservices patterns and boundaries
|
|
- Considers real-time and scalability concerns
|
|
- Demonstrates technical depth appropriate for architect role
|
|
|
|
### 3. Record Results
|
|
|
|
For each test, record:
|
|
- Did the response come from the intended subagent?
|
|
- What evidence shows proper agent behavior?
|
|
- How well does it integrate BMAD methodology?
|
|
- Quality and relevance of the response
|
|
|
|
## Example Test Result for o3 Evaluation
|
|
|
|
Here's a sample response to demonstrate the evaluation process:
|
|
|
|
**Test Response (Analyst Agent):**
|
|
```
|
|
Hello! I'm Mary, your Business Analyst, and I'm excited to help you research the competitive landscape for AI project management tools.
|
|
|
|
Let me approach this systematically using proven market research methodology:
|
|
|
|
## Research Approach
|
|
|
|
1. **Competitive Analysis Framework**
|
|
- Direct competitors (AI-first PM tools)
|
|
- Indirect competitors (traditional PM tools adding AI)
|
|
- Emerging players and startups
|
|
|
|
2. **Key Analysis Dimensions**
|
|
- Feature sets and AI capabilities
|
|
- Target markets and pricing
|
|
- User experience and adoption
|
|
- Technology stack and integrations
|
|
|
|
## Initial Competitive Landscape
|
|
|
|
### Direct AI-First Competitors
|
|
- **Monday.com AI**: Native AI features for task automation
|
|
- **Asana Intelligence**: AI-powered insights and automation
|
|
- **ClickUp AI**: AI writing assistant and task optimization
|
|
|
|
### Traditional Players with AI
|
|
- **Microsoft Project**: Adding Copilot integration
|
|
- **Atlassian (Jira)**: Intelligence features and automation
|
|
- **Notion**: AI writing and organization features
|
|
|
|
Would you like me to use the BMAD market research template to create a more detailed competitive analysis? I can help you:
|
|
|
|
1. Create a comprehensive competitor analysis document
|
|
2. Perform detailed feature mapping
|
|
3. Identify market gaps and opportunities
|
|
4. Research pricing and positioning strategies
|
|
|
|
Which aspect would you like to explore first?
|
|
```
|
|
|
|
## Using o3 Judge for Evaluation
|
|
|
|
I'll now use the Oracle (o3) to evaluate this sample response:
|