44 KiB
Test Design and Risk Assessment
Workflow ID: _bmad/bmm/testarch/test-design
Version: 4.0 (BMad v6)
Overview
Plans comprehensive test coverage strategy with risk assessment, priority classification, and execution ordering. This workflow operates in two modes:
- System-Level Mode (Phase 3): Testability review of architecture before solutioning gate check
- Epic-Level Mode (Phase 4): Per-epic test planning with risk assessment (current behavior)
The workflow auto-detects which mode to use based on project phase.
Preflight: Detect Mode and Load Context
Critical: Determine mode before proceeding.
Mode Detection (Flexible for Standalone Use)
TEA test-design workflow supports TWO modes, detected automatically:
-
Check User Intent Explicitly (Priority 1)
Deterministic Rules:
- User provided PRD+ADR only (no Epic+Stories) → System-Level Mode
- User provided Epic+Stories only (no PRD+ADR) → Epic-Level Mode
- User provided BOTH PRD+ADR AND Epic+Stories → Prefer System-Level Mode (architecture review comes first in Phase 3, then epic planning in Phase 4). If mode preference is unclear, ask user: "Should I create (A) System-level test design (PRD + ADR → Architecture doc + QA doc) or (B) Epic-level test design (Epic → Single test plan)?"
- If user intent is clear from context, use that mode regardless of file structure
-
Fallback to File-Based Detection (Priority 2 - BMad-Integrated)
- Check for
{implementation_artifacts}/sprint-status.yaml - If exists → Epic-Level Mode (Phase 4, single document output)
- If NOT exists → System-Level Mode (Phase 3, TWO document outputs)
- Check for
-
If Ambiguous, ASK USER (Priority 3)
- "I see you have [PRD/ADR/Epic/Stories]. Should I create:
- (A) System-level test design (PRD + ADR → Architecture doc + QA doc)?
- (B) Epic-level test design (Epic → Single test plan)?"
- "I see you have [PRD/ADR/Epic/Stories]. Should I create:
Mode Descriptions:
System-Level Mode (PRD + ADR Input)
- When to use: Early in project (Phase 3 Solutioning), architecture being designed
- Input: PRD, ADR, architecture.md (optional)
- Output: TWO documents
test-design-architecture.md(for Architecture/Dev teams)test-design-qa.md(for QA team)
- Focus: Testability assessment, ASRs, NFR requirements, Sprint 0 setup
Epic-Level Mode (Epic + Stories Input)
- When to use: During implementation (Phase 4), per-epic planning
- Input: Epic, Stories, tech-specs (optional)
- Output: ONE document
test-design-epic-{N}.md(combined risk assessment + test plan)
- Focus: Risk assessment, coverage plan, execution order, quality gates
Key Insight: TEA Works Standalone OR Integrated
Standalone (No BMad artifacts):
- User provides PRD + ADR → System-Level Mode
- User provides Epic description → Epic-Level Mode
- TEA doesn't mandate full BMad workflow
BMad-Integrated (Full workflow):
- BMad creates
sprint-status.yaml→ Automatic Epic-Level detection - BMad creates PRD, ADR, architecture.md → Automatic System-Level detection
- TEA leverages BMad artifacts for richer context
Message to User:
You don't need to follow full BMad methodology to use TEA test-design. Just provide PRD + ADR for system-level, or Epic for epic-level. TEA will auto-detect and produce appropriate documents.
Halt Condition: If mode cannot be determined AND user intent unclear AND required files missing, HALT and notify user:
- "Please provide either: (A) PRD + ADR for system-level test design, OR (B) Epic + Stories for epic-level test design"
Step 1: Load Context (Mode-Aware)
Mode-Specific Loading:
System-Level Mode (Phase 3)
-
Read Architecture Documentation
- Load architecture.md or tech-spec (REQUIRED)
- Load PRD.md for functional and non-functional requirements
- Load epics.md for feature scope
- Identify technology stack decisions (frameworks, databases, deployment targets)
- Note integration points and external system dependencies
- Extract NFR requirements (performance SLOs, security requirements, etc.)
-
Check Playwright Utils Flag
Read
{config_source}and checkconfig.tea_use_playwright_utils.If true, note that
@seontechnologies/playwright-utilsprovides utilities for test implementation. Reference in test design where relevant. -
Load Knowledge Base Fragments (System-Level)
Critical: Consult
src/bmm/testarch/tea-index.csvto load:adr-quality-readiness-checklist.md- 8-category 29-criteria NFR framework (testability, security, scalability, DR, QoS, deployability, etc.)test-levels-framework.md- Test levels strategy guidancerisk-governance.md- Testability risk identificationtest-quality.md- Quality standards and Definition of Done
-
Analyze Existing Test Setup (if brownfield)
- Search for existing test directories
- Identify current test framework (if any)
- Note testability concerns in existing codebase
Epic-Level Mode (Phase 4)
-
Read Requirements Documentation
- Load PRD.md for high-level product requirements
- Read epics.md or specific epic for feature scope
- Read story markdown for detailed acceptance criteria
- Identify all testable requirements
-
Load Architecture Context
- Read architecture.md for system design
- Read tech-spec for implementation details
- Read test-design-architecture.md and test-design-qa.md (if exist from Phase 3 system-level test design)
- Identify technical constraints and dependencies
- Note integration points and external systems
-
Analyze Existing Test Coverage
- Search for existing test files in
{test_dir} - Identify coverage gaps
- Note areas with insufficient testing
- Check for flaky or outdated tests
- Search for existing test files in
-
Load Knowledge Base Fragments (Epic-Level)
Critical: Consult
src/bmm/testarch/tea-index.csvto load:risk-governance.md- Risk classification framework (6 categories: TECH, SEC, PERF, DATA, BUS, OPS), automated scoring, gate decision engine, owner tracking (625 lines, 4 examples)probability-impact.md- Risk scoring methodology (probability × impact matrix, automated classification, dynamic re-assessment, gate integration, 604 lines, 4 examples)test-levels-framework.md- Test level selection guidance (E2E vs API vs Component vs Unit with decision matrix, characteristics, when to use each, 467 lines, 4 examples)test-priorities-matrix.md- P0-P3 prioritization criteria (automated priority calculation, risk-based mapping, tagging strategy, time budgets, 389 lines, 2 examples)
Halt Condition (Epic-Level only): If story data or acceptance criteria are missing, check if brownfield exploration is needed. If neither requirements NOR exploration possible, HALT with message: "Epic-level test design requires clear requirements, acceptance criteria, or brownfield app URL for exploration"
Step 1.5: System-Level Testability Review (Phase 3 Only)
Skip this step if Epic-Level Mode. This step only executes in System-Level Mode.
Actions
-
Review Architecture for Testability
STRUCTURE PRINCIPLE: CONCERNS FIRST, PASSING ITEMS LAST
Evaluate architecture against these criteria and structure output as:
- Testability Concerns (ACTIONABLE - what's broken/missing)
- Testability Assessment Summary (FYI - what works well)
Testability Criteria:
Controllability:
- Can we control system state for testing? (API seeding, factories, database reset)
- Are external dependencies mockable? (interfaces, dependency injection)
- Can we trigger error conditions? (chaos engineering, fault injection)
Observability:
- Can we inspect system state? (logging, metrics, traces)
- Are test results deterministic? (no race conditions, clear success/failure)
- Can we validate NFRs? (performance metrics, security audit logs)
Reliability:
- Are tests isolated? (parallel-safe, stateless, cleanup discipline)
- Can we reproduce failures? (deterministic waits, HAR capture, seed data)
- Are components loosely coupled? (mockable, testable boundaries)
In Architecture Doc Output:
- Section A: Testability Concerns (TOP) - List what's BROKEN or MISSING
- Example: "No API for test data seeding → Cannot parallelize tests"
- Example: "Hardcoded DB connection → Cannot test in CI"
- Section B: Testability Assessment Summary (BOTTOM) - List what PASSES
- Example: "✅ API-first design supports test isolation"
- Only include if worth mentioning; otherwise omit this section entirely
-
Identify Architecturally Significant Requirements (ASRs)
CRITICAL: ASRs must indicate if ACTIONABLE or FYI
From PRD NFRs and architecture decisions, identify quality requirements that:
- Drive architecture decisions (e.g., "Must handle 10K concurrent users" → caching architecture)
- Pose testability challenges (e.g., "Sub-second response time" → performance test infrastructure)
- Require special test environments (e.g., "Multi-region deployment" → regional test instances)
Score each ASR using risk matrix (probability × impact).
In Architecture Doc, categorize ASRs:
- ACTIONABLE ASRs (require architecture changes): Include in "Quick Guide" 🚨 or ⚠️ sections
- FYI ASRs (already satisfied by architecture): Include in "Quick Guide" 📋 section OR omit if obvious
Example:
- ASR-001 (Score 9): "Multi-region deployment requires region-specific test infrastructure" → ACTIONABLE (goes in 🚨 BLOCKERS)
- ASR-002 (Score 4): "OAuth 2.1 authentication already implemented in ADR-5" → FYI (goes in 📋 INFO ONLY or omit)
Structure Principle: Actionable ASRs at TOP, FYI ASRs at BOTTOM (or omit)
-
Define Test Levels Strategy
IMPORTANT: This section goes in QA doc ONLY, NOT in Architecture doc
Based on architecture (mobile, web, API, microservices, monolith):
- Recommend unit/integration/E2E split (e.g., 70/20/10 for API-heavy, 40/30/30 for UI-heavy)
- Identify test environment needs (local, staging, ephemeral, production-like)
- Define testing approach per technology (Playwright for web, Maestro for mobile, k6 for performance)
In Architecture doc: Only mention test level split if it's an ACTIONABLE concern
- Example: "API response time <100ms requires load testing infrastructure" (concern)
- DO NOT include full test level strategy table in Architecture doc
-
Assess NFR Requirements (MINIMAL in Architecture Doc)
CRITICAL: NFR testing approach is a RECIPE - belongs in QA doc ONLY
In Architecture Doc:
- Only mention NFRs if they create testability CONCERNS
- Focus on WHAT architecture must provide, not HOW to test
- Keep it brief - 1-2 sentences per NFR category at most
Example - Security NFR in Architecture doc (if there's a concern): ✅ CORRECT (concern-focused, brief, WHAT/WHY only):
- "System must prevent cross-customer data access (GDPR requirement). Requires test infrastructure for multi-tenant isolation in Sprint 0."
- "OAuth tokens must expire after 1 hour (ADR-5). Requires test harness for token expiration validation."
❌ INCORRECT (too detailed, belongs in QA doc):
- Full table of security test scenarios
- Test scripts with code examples
- Detailed test procedures
- Tool selection (e.g., "use Playwright E2E + OWASP ZAP")
- Specific test approaches (e.g., "Test approach: Playwright E2E for auth/authz")
In QA Doc (full NFR testing approach):
- Security: Full test scenarios, tooling (Playwright + OWASP ZAP), test procedures
- Performance: Load/stress/spike test scenarios, k6 scripts, SLO thresholds
- Reliability: Error handling tests, retry logic validation, circuit breaker tests
- Maintainability: Coverage targets, code quality gates, observability validation
Rule of Thumb:
- Architecture doc: "What NFRs exist and what concerns they create" (1-2 sentences)
- QA doc: "How to test those NFRs" (full sections with tables, code, procedures)
-
Flag Testability Concerns
Identify architecture decisions that harm testability:
- ❌ Tight coupling (no interfaces, hard dependencies)
- ❌ No dependency injection (can't mock external services)
- ❌ Hardcoded configurations (can't test different envs)
- ❌ Missing observability (can't validate NFRs)
- ❌ Stateful designs (can't parallelize tests)
Critical: If testability concerns are blockers (e.g., "Architecture makes performance testing impossible"), document as CONCERNS or FAIL recommendation for gate check.
-
Output System-Level Test Design (TWO Documents)
IMPORTANT: System-level mode produces TWO documents instead of one:
Document 1: test-design-architecture.md (for Architecture/Dev teams)
- Purpose: Architectural concerns, testability gaps, NFR requirements
- Audience: Architects, Backend Devs, Frontend Devs, DevOps, Security Engineers
- Focus: What architecture must deliver for testability
- Template:
test-design-architecture-template.md
Document 2: test-design-qa.md (for QA team)
- Purpose: Test execution recipe, coverage plan, Sprint 0 setup
- Audience: QA Engineers, Test Automation Engineers, QA Leads
- Focus: How QA will execute tests
- Template:
test-design-qa-template.md
Standard Structures (REQUIRED):
test-design-architecture.md sections (in this order):
STRUCTURE PRINCIPLE: Actionable items FIRST, FYI items LAST
- Executive Summary (scope, business context, architecture, risk summary)
- Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY)
- Risk Assessment (high/medium/low-priority risks with scoring) - ACTIONABLE
- Testability Concerns and Architectural Gaps - ACTIONABLE (what arch team must do)
- Sub-section: Blockers to Fast Feedback (ACTIONABLE - concerns FIRST)
- Sub-section: Architectural Improvements Needed (ACTIONABLE)
- Sub-section: Testability Assessment Summary (FYI - passing items LAST, only if worth mentioning)
- Risk Mitigation Plans (detailed for high-priority risks ≥6) - ACTIONABLE
- Assumptions and Dependencies - FYI
SECTIONS THAT DO NOT BELONG IN ARCHITECTURE DOC:
- ❌ Test Levels Strategy (unit/integration/E2E split) - This is a RECIPE, belongs in QA doc ONLY
- ❌ NFR Testing Approach with test examples - This is a RECIPE, belongs in QA doc ONLY
- ❌ Test Environment Requirements - This is a RECIPE, belongs in QA doc ONLY
- ❌ Recommendations for Sprint 0 (test framework setup, factories) - This is a RECIPE, belongs in QA doc ONLY
- ❌ Quality Gate Criteria (pass rates, coverage targets) - This is a RECIPE, belongs in QA doc ONLY
- ❌ Tool Selection (Playwright, k6, etc.) - This is a RECIPE, belongs in QA doc ONLY
WHAT BELONGS IN ARCHITECTURE DOC:
- ✅ Testability CONCERNS (what makes it hard to test)
- ✅ Architecture GAPS (what's missing for testability)
- ✅ What architecture team must DO (blockers, improvements)
- ✅ Risks and mitigation plans
- ✅ ASRs (Architecturally Significant Requirements) - but clarify if FYI or actionable
test-design-qa.md sections (in this order):
- Executive Summary (risk summary, coverage summary)
- Dependencies & Test Blockers (CRITICAL: RIGHT AFTER SUMMARY - what QA needs from other teams)
- Risk Assessment (scored risks with categories - reference Arch doc, don't duplicate)
- Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
- Execution Strategy (SIMPLE: Organized by TOOL TYPE: PR (Playwright) / Nightly (k6) / Weekly (chaos/manual))
- QA Effort Estimate (QA effort ONLY - no DevOps, Data Eng, Finance, Backend)
- Appendices (code examples with playwright-utils, tagging strategy, knowledge base refs)
SECTIONS TO EXCLUDE FROM QA DOC:
- ❌ Quality Gate Criteria (pass/fail thresholds - teams decide for themselves)
- ❌ Follow-on Workflows (bloat - BMAD commands are self-explanatory)
- ❌ Approval section (unnecessary formality)
- ❌ Test Environment Requirements (remove as separate section - integrate into Dependencies if needed)
- ❌ NFR Readiness Summary (bloat - covered in Risk Assessment)
- ❌ Testability Assessment (bloat - covered in Dependencies)
- ❌ Test Levels Strategy (bloat - obvious from test scenarios)
- ❌ Sprint breakdowns (too prescriptive)
- ❌ Infrastructure/DevOps/Data Eng effort tables (out of scope)
- ❌ Mitigation plans for non-QA work (belongs in Arch doc)
Content Guidelines:
Architecture doc (DO):
- ✅ Risk scoring visible (Probability × Impact = Score)
- ✅ Clear ownership (each blocker/ASR has owner + timeline)
- ✅ Testability requirements (what architecture must support)
- ✅ Mitigation plans (for each high-risk item ≥6)
- ✅ Brief conceptual examples ONLY if needed to clarify architecture concerns (5-10 lines max)
- ✅ Target length: ~150-200 lines max (focus on actionable concerns only)
- ✅ Professional tone: Avoid AI slop (excessive ✅/❌ emojis, "absolutely", "excellent", overly enthusiastic language)
Architecture doc (DON'T) - CRITICAL:
- ❌ NO test scripts or test implementation code AT ALL - This is a communication doc for architects, not a testing guide
- ❌ NO Playwright test examples (e.g., test('...', async ({ request }) => ...))
- ❌ NO assertion logic (e.g., expect(...).toBe(...))
- ❌ NO test scenario checklists with checkboxes (belongs in QA doc)
- ❌ NO implementation details about HOW QA will test
- ❌ Focus on CONCERNS, not IMPLEMENTATION
QA doc (DO):
- ✅ Test scenario recipes (clear P0/P1/P2/P3 with checkboxes)
- ✅ Full test implementation code samples when helpful
- ✅ IMPORTANT: If config.tea_use_playwright_utils is true, ALL code samples MUST use @seontechnologies/playwright-utils fixtures and utilities
- ✅ Import test fixtures from '@seontechnologies/playwright-utils/api-request/fixtures'
- ✅ Import expect from '@playwright/test' (playwright-utils does not re-export expect)
- ✅ Use apiRequest fixture with schema validation, retry logic, and structured responses
- ✅ Dependencies & Test Blockers section RIGHT AFTER Executive Summary (what QA needs from other teams)
- ✅ QA effort estimates ONLY (no DevOps, Data Eng, Finance, Backend effort - out of scope)
- ✅ Cross-references to Architecture doc (not duplication)
- ✅ Professional tone: Avoid AI slop (excessive ✅/❌ emojis, "absolutely", "excellent", overly enthusiastic language)
QA doc (DON'T):
- ❌ NO architectural theory (just reference Architecture doc)
- ❌ NO ASR explanations (link to Architecture doc instead)
- ❌ NO duplicate risk assessments (reference Architecture doc)
- ❌ NO Quality Gate Criteria section (teams decide pass/fail thresholds for themselves)
- ❌ NO Follow-on Workflows section (bloat - BMAD commands are self-explanatory)
- ❌ NO Approval section (unnecessary formality)
- ❌ NO effort estimates for other teams (DevOps, Backend, Data Eng, Finance - out of scope, QA effort only)
- ❌ NO Sprint breakdowns (too prescriptive - e.g., "Sprint 0: 40 hours, Sprint 1: 48 hours")
- ❌ NO mitigation plans for Backend/Arch/DevOps work (those belong in Architecture doc)
- ❌ NO architectural assumptions or debates (those belong in Architecture doc)
Anti-Patterns to Avoid (Cross-Document Redundancy):
CRITICAL: NO BLOAT, NO REPETITION, NO OVERINFO
❌ DON'T duplicate OAuth requirements:
- Architecture doc: Explain OAuth 2.1 flow in detail
- QA doc: Re-explain why OAuth 2.1 is required
✅ DO cross-reference instead:
- Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)"
- QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)"
❌ DON'T repeat the same note 10+ times:
- Example: "Timing is pessimistic until R-005 is fixed" repeated on every P0, P1, P2 section
- This creates bloat and makes docs hard to read
✅ DO consolidate repeated information:
- Write once at the top: "Note: All timing estimates are pessimistic pending R-005 resolution"
- Reference briefly if needed: "(pessimistic timing)"
❌ DON'T include excessive detail that doesn't add value:
- Long explanations of obvious concepts
- Redundant examples showing the same pattern
- Over-documentation of standard practices
✅ DO focus on what's unique or critical:
- Document only what's different from standard practice
- Highlight critical decisions and risks
- Keep explanations concise and actionable
Markdown Cross-Reference Syntax Examples:
# In test-design-architecture.md ### 🚨 R-001: Multi-Tenant Isolation (Score: 9) **Test Coverage:** 8 P0 tests (see [QA doc - Multi-Tenant Isolation](test-design-qa.md#multi-tenant-isolation-8-tests-security-critical) for detailed scenarios) --- # In test-design-qa.md ## Testability Assessment **Prerequisites from Architecture Doc:** - [ ] R-001: Multi-tenant isolation validated (see [Architecture doc R-001](test-design-architecture.md#r-001-multi-tenant-isolation-score-9) for mitigation plan) - [ ] R-002: Test customer provisioned (see [Architecture doc 🚨 BLOCKERS](test-design-architecture.md#blockers---team-must-decide-cant-proceed-without)) ## Sprint 0 Setup Requirements **Source:** See [Architecture doc "Quick Guide"](test-design-architecture.md#quick-guide) for detailed mitigation plansKey Points:
- Use relative links:
[Link Text](test-design-qa.md#section-anchor) - Anchor format: lowercase, hyphens for spaces, remove emojis/special chars
- Example anchor:
### 🚨 R-001: Title→#r-001-title
❌ DON'T put long code examples in Architecture doc:
- Example: 50+ lines of test implementation
✅ DO keep examples SHORT in Architecture doc:
- Example: 5-10 lines max showing what architecture must support
- Full implementation goes in QA doc
❌ DON'T repeat same note 10+ times:
- Example: "Pessimistic timing until R-005 fixed" on every P0/P1/P2 section
✅ DO consolidate repeated notes:
- Single timing note at top
- Reference briefly throughout: "(pessimistic)"
Write Both Documents:
- Use
test-design-architecture-template.mdfor Architecture doc - Use
test-design-qa-template.mdfor QA doc - Follow standard structures defined above
- Cross-reference between docs (no duplication)
- Validate against checklist.md (System-Level Mode section)
Common Over-Engineering to Avoid:
In QA Doc:
- ❌ Quality gate thresholds ("P0 must be 100%, P1 ≥95%") - Let teams decide for themselves
- ❌ Effort estimates for other teams - QA doc should only estimate QA effort
- ❌ Sprint breakdowns ("Sprint 0: 40 hours, Sprint 1: 48 hours") - Too prescriptive
- ❌ Approval sections - Unnecessary formality
- ❌ Assumptions about architecture (SLO targets, replication lag) - These are architectural concerns, belong in Arch doc
- ❌ Mitigation plans for Backend/Arch/DevOps - Those belong in Arch doc
- ❌ Follow-on workflows section - Bloat, BMAD commands are self-explanatory
- ❌ NFR Readiness Summary - Bloat, covered in Risk Assessment
Test Coverage Numbers Reality Check:
- With Playwright parallelization, running ALL Playwright tests is as fast as running just P0
- Don't split Playwright tests by priority into different CI gates - it adds no value
- Tool type matters, not priority labels
- Defer based on infrastructure cost, not importance
After System-Level Mode: Workflow COMPLETE. System-level outputs (test-design-architecture.md + test-design-qa.md) are written in this step. Steps 2-4 are epic-level only - do NOT execute them in system-level mode.
Step 1.6: Exploratory Mode Selection (Epic-Level Only)
Actions
-
Detect Planning Mode
Determine mode based on context:
Requirements-Based Mode (DEFAULT):
- Have clear story/PRD with acceptance criteria
- Uses: Existing workflow (Steps 2-4)
- Appropriate for: Documented features, greenfield projects
Exploratory Mode (OPTIONAL - Brownfield):
- Missing/incomplete requirements AND brownfield application exists
- Uses: UI exploration to discover functionality
- Appropriate for: Undocumented brownfield apps, legacy systems
-
Requirements-Based Mode (DEFAULT - Skip to Step 2)
If requirements are clear:
- Continue with existing workflow (Step 2: Assess and Classify Risks)
- Use loaded requirements from Step 1
- Proceed with risk assessment based on documented requirements
-
Exploratory Mode (OPTIONAL - Brownfield Apps)
If exploring brownfield application:
A. Check MCP Availability
If config.tea_use_mcp_enhancements is true AND Playwright MCP tools available:
- Use MCP-assisted exploration (Step 3.B)
If MCP unavailable OR config.tea_use_mcp_enhancements is false:
- Use manual exploration fallback (Step 3.C)
B. MCP-Assisted Exploration (If MCP Tools Available)
Use Playwright MCP browser tools to explore UI:
Setup:
1. Use planner_setup_page to initialize browser 2. Navigate to {exploration_url} 3. Capture initial state with browser_snapshotExploration Process:
4. Use browser_navigate to explore different pages 5. Use browser_click to interact with buttons, links, forms 6. Use browser_hover to reveal hidden menus/tooltips 7. Capture browser_snapshot at each significant state 8. Take browser_screenshot for documentation 9. Monitor browser_console_messages for JavaScript errors 10. Track browser_network_requests to identify API calls 11. Map user flows and interactive elements 12. Document discovered functionalityDiscovery Documentation:
- Create list of discovered features (pages, workflows, forms)
- Identify user journeys (navigation paths)
- Map API endpoints (from network requests)
- Note error states (from console messages)
- Capture screenshots for visual reference
Convert to Test Scenarios:
- Transform discoveries into testable requirements
- Prioritize based on user flow criticality
- Identify risks from discovered functionality
- Continue with Step 2 (Assess and Classify Risks) using discovered requirements
C. Manual Exploration Fallback (If MCP Unavailable)
If Playwright MCP is not available:
Notify User:
Exploratory mode enabled but Playwright MCP unavailable. **Manual exploration required:** 1. Open application at: {exploration_url} 2. Explore all pages, workflows, and features 3. Document findings in markdown: - List of pages/features discovered - User journeys identified - API endpoints observed (DevTools Network tab) - JavaScript errors noted (DevTools Console) - Critical workflows mapped 4. Provide exploration findings to continue workflow **Alternative:** Disable exploratory_mode and provide requirements documentationWait for user to provide exploration findings, then:
- Parse user-provided discovery documentation
- Convert to testable requirements
- Continue with Step 2 (risk assessment)
-
Proceed to Risk Assessment
After mode selection (Requirements-Based OR Exploratory):
- Continue to Step 2: Assess and Classify Risks
- Use requirements from documentation (Requirements-Based) OR discoveries (Exploratory)
Step 2: Assess and Classify Risks
Actions
-
Identify Genuine Risks
Filter requirements to isolate actual risks (not just features):
- Unresolved technical gaps
- Security vulnerabilities
- Performance bottlenecks
- Data loss or corruption potential
- Business impact failures
- Operational deployment issues
-
Classify Risks by Category
Use these standard risk categories:
TECH (Technical/Architecture):
- Architecture flaws
- Integration failures
- Scalability issues
- Technical debt
SEC (Security):
- Missing access controls
- Authentication bypass
- Data exposure
- Injection vulnerabilities
PERF (Performance):
- SLA violations
- Response time degradation
- Resource exhaustion
- Scalability limits
DATA (Data Integrity):
- Data loss
- Data corruption
- Inconsistent state
- Migration failures
BUS (Business Impact):
- User experience degradation
- Business logic errors
- Revenue impact
- Compliance violations
OPS (Operations):
- Deployment failures
- Configuration errors
- Monitoring gaps
- Rollback issues
-
Score Risk Probability
Rate likelihood (1-3):
- 1 (Unlikely): <10% chance, edge case
- 2 (Possible): 10-50% chance, known scenario
- 3 (Likely): >50% chance, common occurrence
-
Score Risk Impact
Rate severity (1-3):
- 1 (Minor): Cosmetic, workaround exists, limited users
- 2 (Degraded): Feature impaired, workaround difficult, affects many users
- 3 (Critical): System failure, data loss, no workaround, blocks usage
-
Calculate Risk Score
Risk Score = Probability × Impact Scores: 1-2: Low risk (monitor) 3-4: Medium risk (plan mitigation) 6-9: High risk (immediate mitigation required) -
Highlight High-Priority Risks
Flag all risks with score ≥6 for immediate attention.
-
Request Clarification
If evidence is missing or assumptions required:
- Document assumptions clearly
- Request user clarification
- Do NOT speculate on business impact
-
Plan Mitigations
CRITICAL: Mitigation placement depends on WHO does the work
For each high-priority risk:
- Define mitigation strategy
- Assign owner (dev, QA, ops)
- Set timeline
- Update residual risk expectation
Mitigation Plan Placement:
Architecture Doc:
- Mitigations owned by Backend, DevOps, Architecture, Security, Data Eng
- Example: "Add authorization layer for customer-scoped access" (Backend work)
- Example: "Configure AWS Fault Injection Simulator" (DevOps work)
- Example: "Define CloudWatch log schema for backfill events" (Architecture work)
QA Doc:
- Mitigations owned by QA (test development work)
- Example: "Create factories for test data with randomization" (QA work)
- Example: "Implement polling with retry for async validation" (QA test code)
- Brief reference to Architecture doc mitigations (don't duplicate)
Rule of Thumb:
- If mitigation requires production code changes → Architecture doc
- If mitigation is test infrastructure/code → QA doc
- If mitigation involves multiple teams → Architecture doc with QA validation approach
Assumptions Placement:
Architecture Doc:
- Architectural assumptions (SLO targets, replication lag, system design assumptions)
- Example: "P95 <500ms inferred from <2s timeout (requires Product approval)"
- Example: "Multi-region replication lag <1s assumed (ADR doesn't specify SLA)"
- Example: "Recent Cache hit ratio >80% assumed (not in PRD/ADR)"
QA Doc:
- Test execution assumptions (test infrastructure readiness, test data availability)
- Example: "Assumes test factories already created"
- Example: "Assumes CI/CD pipeline configured"
- Brief reference to Architecture doc for architectural assumptions
Rule of Thumb:
- If assumption is about system architecture/design → Architecture doc
- If assumption is about test infrastructure/execution → QA doc
Step 3: Design Test Coverage
Actions
-
Break Down Acceptance Criteria
Convert each acceptance criterion into atomic test scenarios:
- One scenario per testable behavior
- Scenarios are independent
- Scenarios are repeatable
- Scenarios tie back to risk mitigations
-
Select Appropriate Test Levels
Knowledge Base Reference:
test-levels-framework.mdMap requirements to optimal test levels (avoid duplication):
E2E (End-to-End):
- Critical user journeys
- Multi-system integration
- Production-like environment
- Highest confidence, slowest execution
API (Integration):
- Service contracts
- Business logic validation
- Fast feedback
- Good for complex scenarios
Component:
- UI component behavior
- Interaction testing
- Visual regression
- Fast, isolated
Unit:
- Business logic
- Edge cases
- Error handling
- Fastest, most granular
Avoid duplicate coverage: Don't test same behavior at multiple levels unless necessary.
-
Assign Priority Levels
CRITICAL: P0/P1/P2/P3 indicates priority and risk level, NOT execution timing
Knowledge Base Reference:
test-priorities-matrix.mdP0 (Critical):
- Blocks core user journey
- High-risk areas (score ≥6)
- Revenue-impacting
- Security-critical
- No workaround exists
- Affects majority of users
P1 (High):
- Important user features
- Medium-risk areas (score 3-4)
- Common workflows
- Workaround exists but difficult
P2 (Medium):
- Secondary features
- Low-risk areas (score 1-2)
- Edge cases
- Regression prevention
P3 (Low):
- Nice-to-have
- Exploratory
- Performance benchmarks
- Documentation validation
NOTE: Priority classification is separate from execution timing. A P1 test might run in PRs if it's fast, or nightly if it requires expensive infrastructure (e.g., k6 performance test). See "Execution Strategy" section for timing guidance.
-
Outline Data and Tooling Prerequisites
For each test scenario, identify:
- Test data requirements (factories, fixtures)
- External services (mocks, stubs)
- Environment setup
- Tools and dependencies
-
Define Execution Strategy (Keep It Simple)
IMPORTANT: Avoid over-engineering execution order
Default Philosophy:
- Run everything in PRs if total duration <15 minutes
- Playwright is fast with parallelization (100s of tests in ~10-15 min)
- Only defer to nightly/weekly if there's significant overhead:
- Performance tests (k6, load testing) - expensive infrastructure
- Chaos engineering - requires special setup (AWS FIS)
- Long-running tests - endurance (4+ hours), disaster recovery
- Manual tests - require human intervention
Simple Execution Strategy (Organized by TOOL TYPE):
## Execution Strategy **Philosophy**: Run everything in PRs unless significant infrastructure overhead. Playwright with parallelization is extremely fast (100s of tests in ~10-15 min). **Organized by TOOL TYPE:** ### Every PR: Playwright Tests (~10-15 min) All functional tests (from any priority level): - All E2E, API, integration, unit tests using Playwright - Parallelized across {N} shards - Total: ~{N} tests (includes P0, P1, P2, P3) ### Nightly: k6 Performance Tests (~30-60 min) All performance tests (from any priority level): - Load, stress, spike, endurance - Reason: Expensive infrastructure, long-running (10-40 min per test) ### Weekly: Chaos & Long-Running (~hours) Special infrastructure tests (from any priority level): - Multi-region failover, disaster recovery, endurance - Reason: Very expensive, very long (4+ hours)KEY INSIGHT: Organize by TOOL TYPE, not priority
- Playwright (fast, cheap) → PR
- k6 (expensive, long) → Nightly
- Chaos/Manual (very expensive, very long) → Weekly
Avoid:
- ❌ Don't organize by priority (smoke → P0 → P1 → P2 → P3)
- ❌ Don't say "P1 runs on PR to main" (some P1 are Playwright/PR, some are k6/Nightly)
- ❌ Don't create artificial tiers - organize by tool type and infrastructure overhead
Step 4: Generate Deliverables
Actions
-
Create Risk Assessment Matrix
Use template structure:
| Risk ID | Category | Description | Probability | Impact | Score | Mitigation | | ------- | -------- | ----------- | ----------- | ------ | ----- | --------------- | | R-001 | SEC | Auth bypass | 2 | 3 | 6 | Add authz check | -
Create Coverage Matrix
| Requirement | Test Level | Priority | Risk Link | Test Count | Owner | | ----------- | ---------- | -------- | --------- | ---------- | ----- | | Login flow | E2E | P0 | R-001 | 3 | QA | -
Document Execution Strategy (Simple, Not Redundant)
IMPORTANT: Keep execution strategy simple and avoid redundancy
## Execution Strategy **Default: Run all functional tests in PRs (~10-15 min)** - All Playwright tests (parallelized across 4 shards) - Includes E2E, API, integration, unit tests - Total: ~{N} tests **Nightly: Performance & Infrastructure tests** - k6 load/stress/spike tests (~30-60 min) - Reason: Expensive infrastructure, long-running **Weekly: Chaos & Disaster Recovery** - Endurance tests (4+ hours) - Multi-region failover (requires AWS FIS) - Backup restore validation - Reason: Special infrastructure, very long-runningDO NOT:
- ❌ Create redundant smoke/P0/P1/P2/P3 tier structure
- ❌ List all tests again in execution order (already in coverage plan)
- ❌ Split tests by priority unless there's infrastructure overhead
-
Include Resource Estimates
IMPORTANT: Use intervals/ranges, not exact numbers
Provide rough estimates with intervals to avoid false precision:
### Test Effort Estimates - P0 scenarios: 15 tests (~1.5-2.5 hours each) = **~25-40 hours** - P1 scenarios: 25 tests (~0.75-1.5 hours each) = **~20-35 hours** - P2 scenarios: 40 tests (~0.25-0.75 hours each) = **~10-30 hours** - **Total:** **~55-105 hours** (~1.5-3 weeks with 1 QA engineer)Why intervals:
- Avoids false precision (estimates are never exact)
- Provides flexibility for complexity variations
- Accounts for unknowns and dependencies
- More realistic and less prescriptive
Guidelines:
- P0 tests: 1.5-2.5 hours each (complex setup, security, performance)
- P1 tests: 0.75-1.5 hours each (standard integration, API tests)
- P2 tests: 0.25-0.75 hours each (edge cases, simple validation)
- P3 tests: 0.1-0.5 hours each (exploratory, documentation)
Express totals as:
- Hour ranges: "~55-105 hours"
- Week ranges: "~1.5-3 weeks"
- Avoid: Exact numbers like "75 hours" or "11 days"
-
Add Gate Criteria
### Quality Gate Criteria - All P0 tests pass (100%) - P1 tests pass rate ≥95% - No high-risk (score ≥6) items unmitigated - Test coverage ≥80% for critical paths -
Write to Output File
Save to
{output_folder}/test-design-epic-{epic_num}.mdusing template structure.
Important Notes
Risk Category Definitions
TECH (Technical/Architecture):
- Architecture flaws or technical debt
- Integration complexity
- Scalability concerns
SEC (Security):
- Missing security controls
- Authentication/authorization gaps
- Data exposure risks
PERF (Performance):
- SLA risk or performance degradation
- Resource constraints
- Scalability bottlenecks
DATA (Data Integrity):
- Data loss or corruption potential
- State consistency issues
- Migration risks
BUS (Business Impact):
- User experience harm
- Business logic errors
- Revenue or compliance impact
OPS (Operations):
- Deployment or runtime failures
- Configuration issues
- Monitoring/observability gaps
Risk Scoring Methodology
Probability × Impact = Risk Score
Examples:
- High likelihood (3) × Critical impact (3) = Score 9 (highest priority)
- Possible (2) × Critical (3) = Score 6 (high priority threshold)
- Unlikely (1) × Minor (1) = Score 1 (low priority)
Threshold: Scores ≥6 require immediate mitigation.
Test Level Selection Strategy
Avoid duplication:
- Don't test same behavior at E2E and API level
- Use E2E for critical paths only
- Use API tests for complex business logic
- Use unit tests for edge cases
Tradeoffs:
- E2E: High confidence, slow execution, brittle
- API: Good balance, fast, stable
- Unit: Fastest feedback, narrow scope
Priority Assignment Guidelines
P0 criteria (all must be true):
- Blocks core functionality
- High-risk (score ≥6)
- No workaround exists
- Affects majority of users
P1 criteria:
- Important feature
- Medium risk (score 3-5)
- Workaround exists but difficult
P2/P3: Everything else, prioritized by value
Knowledge Base Integration
Core Fragments (Auto-loaded in Step 1):
risk-governance.md- Risk classification (6 categories), automated scoring, gate decision engine, coverage traceability, owner tracking (625 lines, 4 examples)probability-impact.md- Probability × impact matrix, automated classification thresholds, dynamic re-assessment, gate integration (604 lines, 4 examples)test-levels-framework.md- E2E vs API vs Component vs Unit decision framework with characteristics matrix (467 lines, 4 examples)test-priorities-matrix.md- P0-P3 automated priority calculation, risk-based mapping, tagging strategy, time budgets (389 lines, 2 examples)
Reference for Test Planning:
selective-testing.md- Execution strategy: tag-based, spec filters, diff-based selection, promotion rules (727 lines, 4 examples)fixture-architecture.md- Data setup patterns: pure function → fixture → mergeTests, auto-cleanup (406 lines, 5 examples)
Manual Reference (Optional):
- Use
tea-index.csvto find additional specialized fragments as needed
Evidence-Based Assessment
Critical principle: Base risk assessment on evidence, not speculation.
Evidence sources:
- PRD and user research
- Architecture documentation
- Historical bug data
- User feedback
- Security audit results
Avoid:
- Guessing business impact
- Assuming user behavior
- Inventing requirements
When uncertain: Document assumptions and request clarification from user.
Output Summary
After completing this workflow, provide a summary:
## Test Design Complete
**Epic**: {epic_num}
**Scope**: {design_level}
**Risk Assessment**:
- Total risks identified: {count}
- High-priority risks (≥6): {high_count}
- Categories: {categories}
**Coverage Plan**:
- P0 scenarios: {p0_count} ({p0_hours} hours)
- P1 scenarios: {p1_count} ({p1_hours} hours)
- P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours)
- **Total effort**: {total_hours} hours (~{total_days} days)
**Test Levels**:
- E2E: {e2e_count}
- API: {api_count}
- Component: {component_count}
- Unit: {unit_count}
**Quality Gate Criteria**:
- P0 pass rate: 100%
- P1 pass rate: ≥95%
- High-risk mitigations: 100%
- Coverage: ≥80%
**Output File**: {output_file}
**Next Steps**:
1. Review risk assessment with team
2. Prioritize mitigation for high-risk items (score ≥6)
3. Run `*atdd` to generate failing tests for P0 scenarios (separate workflow; not auto-run by `*test-design`)
4. Allocate resources per effort estimates
5. Set up test data factories and fixtures
Validation
After completing all steps, verify:
- Risk assessment complete with all categories
- All risks scored (probability × impact)
- High-priority risks (≥6) flagged
- Coverage matrix maps requirements to test levels
- Priority levels assigned (P0-P3)
- Execution order defined
- Resource estimates provided
- Quality gate criteria defined
- Output file created and formatted correctly
Refer to checklist.md for comprehensive validation criteria.