# Test Design and Risk Assessment **Workflow ID**: `_bmad/bmm/testarch/test-design` **Version**: 4.0 (BMad v6) --- ## Overview Plans comprehensive test coverage strategy with risk assessment, priority classification, and execution ordering. This workflow operates in **two modes**: - **System-Level Mode (Phase 3)**: Testability review of architecture before solutioning gate check - **Epic-Level Mode (Phase 4)**: Per-epic test planning with risk assessment (current behavior) The workflow auto-detects which mode to use based on project phase. --- ## Preflight: Detect Mode and Load Context **Critical:** Determine mode before proceeding. ### Mode Detection (Flexible for Standalone Use) TEA test-design workflow supports TWO modes, detected automatically: 1. **Check User Intent Explicitly (Priority 1)** **Deterministic Rules:** - User provided **PRD+ADR only** (no Epic+Stories) → **System-Level Mode** - User provided **Epic+Stories only** (no PRD+ADR) → **Epic-Level Mode** - User provided **BOTH PRD+ADR AND Epic+Stories** → **Prefer System-Level Mode** (architecture review comes first in Phase 3, then epic planning in Phase 4). If mode preference is unclear, ask user: "Should I create (A) System-level test design (PRD + ADR → Architecture doc + QA doc) or (B) Epic-level test design (Epic → Single test plan)?" - If user intent is clear from context, use that mode regardless of file structure 2. **Fallback to File-Based Detection (Priority 2 - BMad-Integrated)** - Check for `{implementation_artifacts}/sprint-status.yaml` - If exists → **Epic-Level Mode** (Phase 4, single document output) - If NOT exists → **System-Level Mode** (Phase 3, TWO document outputs) 3. **If Ambiguous, ASK USER (Priority 3)** - "I see you have [PRD/ADR/Epic/Stories]. Should I create: - (A) System-level test design (PRD + ADR → Architecture doc + QA doc)? - (B) Epic-level test design (Epic → Single test plan)?" **Mode Descriptions:** **System-Level Mode (PRD + ADR Input)** - **When to use:** Early in project (Phase 3 Solutioning), architecture being designed - **Input:** PRD, ADR, architecture.md (optional) - **Output:** TWO documents - `test-design-architecture.md` (for Architecture/Dev teams) - `test-design-qa.md` (for QA team) - **Focus:** Testability assessment, ASRs, NFR requirements, Sprint 0 setup **Epic-Level Mode (Epic + Stories Input)** - **When to use:** During implementation (Phase 4), per-epic planning - **Input:** Epic, Stories, tech-specs (optional) - **Output:** ONE document - `test-design-epic-{N}.md` (combined risk assessment + test plan) - **Focus:** Risk assessment, coverage plan, execution order, quality gates **Key Insight: TEA Works Standalone OR Integrated** **Standalone (No BMad artifacts):** - User provides PRD + ADR → System-Level Mode - User provides Epic description → Epic-Level Mode - TEA doesn't mandate full BMad workflow **BMad-Integrated (Full workflow):** - BMad creates `sprint-status.yaml` → Automatic Epic-Level detection - BMad creates PRD, ADR, architecture.md → Automatic System-Level detection - TEA leverages BMad artifacts for richer context **Message to User:** > You don't need to follow full BMad methodology to use TEA test-design. > Just provide PRD + ADR for system-level, or Epic for epic-level. > TEA will auto-detect and produce appropriate documents. **Halt Condition:** If mode cannot be determined AND user intent unclear AND required files missing, HALT and notify user: - "Please provide either: (A) PRD + ADR for system-level test design, OR (B) Epic + Stories for epic-level test design" --- ## Step 1: Load Context (Mode-Aware) **Mode-Specific Loading:** ### System-Level Mode (Phase 3) 1. **Read Architecture Documentation** - Load architecture.md or tech-spec (REQUIRED) - Load PRD.md for functional and non-functional requirements - Load epics.md for feature scope - Identify technology stack decisions (frameworks, databases, deployment targets) - Note integration points and external system dependencies - Extract NFR requirements (performance SLOs, security requirements, etc.) 2. **Check Playwright Utils Flag** Read `{config_source}` and check `config.tea_use_playwright_utils`. If true, note that `@seontechnologies/playwright-utils` provides utilities for test implementation. Reference in test design where relevant. 3. **Load Knowledge Base Fragments (System-Level)** **Critical:** Consult `src/bmm/testarch/tea-index.csv` to load: - `adr-quality-readiness-checklist.md` - 8-category 29-criteria NFR framework (testability, security, scalability, DR, QoS, deployability, etc.) - `test-levels-framework.md` - Test levels strategy guidance - `risk-governance.md` - Testability risk identification - `test-quality.md` - Quality standards and Definition of Done 4. **Analyze Existing Test Setup (if brownfield)** - Search for existing test directories - Identify current test framework (if any) - Note testability concerns in existing codebase ### Epic-Level Mode (Phase 4) 1. **Read Requirements Documentation** - Load PRD.md for high-level product requirements - Read epics.md or specific epic for feature scope - Read story markdown for detailed acceptance criteria - Identify all testable requirements 2. **Load Architecture Context** - Read architecture.md for system design - Read tech-spec for implementation details - Read test-design-architecture.md and test-design-qa.md (if exist from Phase 3 system-level test design) - Identify technical constraints and dependencies - Note integration points and external systems 3. **Analyze Existing Test Coverage** - Search for existing test files in `{test_dir}` - Identify coverage gaps - Note areas with insufficient testing - Check for flaky or outdated tests 4. **Load Knowledge Base Fragments (Epic-Level)** **Critical:** Consult `src/bmm/testarch/tea-index.csv` to load: - `risk-governance.md` - Risk classification framework (6 categories: TECH, SEC, PERF, DATA, BUS, OPS), automated scoring, gate decision engine, owner tracking (625 lines, 4 examples) - `probability-impact.md` - Risk scoring methodology (probability × impact matrix, automated classification, dynamic re-assessment, gate integration, 604 lines, 4 examples) - `test-levels-framework.md` - Test level selection guidance (E2E vs API vs Component vs Unit with decision matrix, characteristics, when to use each, 467 lines, 4 examples) - `test-priorities-matrix.md` - P0-P3 prioritization criteria (automated priority calculation, risk-based mapping, tagging strategy, time budgets, 389 lines, 2 examples) **Halt Condition (Epic-Level only):** If story data or acceptance criteria are missing, check if brownfield exploration is needed. If neither requirements NOR exploration possible, HALT with message: "Epic-level test design requires clear requirements, acceptance criteria, or brownfield app URL for exploration" --- ## Step 1.5: System-Level Testability Review (Phase 3 Only) **Skip this step if Epic-Level Mode.** This step only executes in System-Level Mode. ### Actions 1. **Review Architecture for Testability** **STRUCTURE PRINCIPLE: CONCERNS FIRST, PASSING ITEMS LAST** Evaluate architecture against these criteria and structure output as: 1. **Testability Concerns** (ACTIONABLE - what's broken/missing) 2. **Testability Assessment Summary** (FYI - what works well) **Testability Criteria:** **Controllability:** - Can we control system state for testing? (API seeding, factories, database reset) - Are external dependencies mockable? (interfaces, dependency injection) - Can we trigger error conditions? (chaos engineering, fault injection) **Observability:** - Can we inspect system state? (logging, metrics, traces) - Are test results deterministic? (no race conditions, clear success/failure) - Can we validate NFRs? (performance metrics, security audit logs) **Reliability:** - Are tests isolated? (parallel-safe, stateless, cleanup discipline) - Can we reproduce failures? (deterministic waits, HAR capture, seed data) - Are components loosely coupled? (mockable, testable boundaries) **In Architecture Doc Output:** - **Section A: Testability Concerns** (TOP) - List what's BROKEN or MISSING - Example: "No API for test data seeding → Cannot parallelize tests" - Example: "Hardcoded DB connection → Cannot test in CI" - **Section B: Testability Assessment Summary** (BOTTOM) - List what PASSES - Example: "✅ API-first design supports test isolation" - Only include if worth mentioning; otherwise omit this section entirely 2. **Identify Architecturally Significant Requirements (ASRs)** **CRITICAL: ASRs must indicate if ACTIONABLE or FYI** From PRD NFRs and architecture decisions, identify quality requirements that: - Drive architecture decisions (e.g., "Must handle 10K concurrent users" → caching architecture) - Pose testability challenges (e.g., "Sub-second response time" → performance test infrastructure) - Require special test environments (e.g., "Multi-region deployment" → regional test instances) Score each ASR using risk matrix (probability × impact). **In Architecture Doc, categorize ASRs:** - **ACTIONABLE ASRs** (require architecture changes): Include in "Quick Guide" 🚨 or ⚠️ sections - **FYI ASRs** (already satisfied by architecture): Include in "Quick Guide" 📋 section OR omit if obvious **Example:** - ASR-001 (Score 9): "Multi-region deployment requires region-specific test infrastructure" → **ACTIONABLE** (goes in 🚨 BLOCKERS) - ASR-002 (Score 4): "OAuth 2.1 authentication already implemented in ADR-5" → **FYI** (goes in 📋 INFO ONLY or omit) **Structure Principle:** Actionable ASRs at TOP, FYI ASRs at BOTTOM (or omit) 3. **Define Test Levels Strategy** **IMPORTANT: This section goes in QA doc ONLY, NOT in Architecture doc** Based on architecture (mobile, web, API, microservices, monolith): - Recommend unit/integration/E2E split (e.g., 70/20/10 for API-heavy, 40/30/30 for UI-heavy) - Identify test environment needs (local, staging, ephemeral, production-like) - Define testing approach per technology (Playwright for web, Maestro for mobile, k6 for performance) **In Architecture doc:** Only mention test level split if it's an ACTIONABLE concern - Example: "API response time <100ms requires load testing infrastructure" (concern) - DO NOT include full test level strategy table in Architecture doc 4. **Assess NFR Requirements (MINIMAL in Architecture Doc)** **CRITICAL: NFR testing approach is a RECIPE - belongs in QA doc ONLY** **In Architecture Doc:** - Only mention NFRs if they create testability CONCERNS - Focus on WHAT architecture must provide, not HOW to test - Keep it brief - 1-2 sentences per NFR category at most **Example - Security NFR in Architecture doc (if there's a concern):** ✅ CORRECT (concern-focused, brief, WHAT/WHY only): - "System must prevent cross-customer data access (GDPR requirement). Requires test infrastructure for multi-tenant isolation in Sprint 0." - "OAuth tokens must expire after 1 hour (ADR-5). Requires test harness for token expiration validation." ❌ INCORRECT (too detailed, belongs in QA doc): - Full table of security test scenarios - Test scripts with code examples - Detailed test procedures - Tool selection (e.g., "use Playwright E2E + OWASP ZAP") - Specific test approaches (e.g., "Test approach: Playwright E2E for auth/authz") **In QA Doc (full NFR testing approach):** - **Security**: Full test scenarios, tooling (Playwright + OWASP ZAP), test procedures - **Performance**: Load/stress/spike test scenarios, k6 scripts, SLO thresholds - **Reliability**: Error handling tests, retry logic validation, circuit breaker tests - **Maintainability**: Coverage targets, code quality gates, observability validation **Rule of Thumb:** - Architecture doc: "What NFRs exist and what concerns they create" (1-2 sentences) - QA doc: "How to test those NFRs" (full sections with tables, code, procedures) 5. **Flag Testability Concerns** Identify architecture decisions that harm testability: - ❌ Tight coupling (no interfaces, hard dependencies) - ❌ No dependency injection (can't mock external services) - ❌ Hardcoded configurations (can't test different envs) - ❌ Missing observability (can't validate NFRs) - ❌ Stateful designs (can't parallelize tests) **Critical:** If testability concerns are blockers (e.g., "Architecture makes performance testing impossible"), document as CONCERNS or FAIL recommendation for gate check. 6. **Output System-Level Test Design (TWO Documents)** **IMPORTANT:** System-level mode produces TWO documents instead of one: **Document 1: test-design-architecture.md** (for Architecture/Dev teams) - Purpose: Architectural concerns, testability gaps, NFR requirements - Audience: Architects, Backend Devs, Frontend Devs, DevOps, Security Engineers - Focus: What architecture must deliver for testability - Template: `test-design-architecture-template.md` **Document 2: test-design-qa.md** (for QA team) - Purpose: Test execution recipe, coverage plan, Sprint 0 setup - Audience: QA Engineers, Test Automation Engineers, QA Leads - Focus: How QA will execute tests - Template: `test-design-qa-template.md` **Standard Structures (REQUIRED):** **test-design-architecture.md sections (in this order):** **STRUCTURE PRINCIPLE: Actionable items FIRST, FYI items LAST** 1. Executive Summary (scope, business context, architecture, risk summary) 2. Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY) 3. Risk Assessment (high/medium/low-priority risks with scoring) - **ACTIONABLE** 4. Testability Concerns and Architectural Gaps - **ACTIONABLE** (what arch team must do) - Sub-section: Blockers to Fast Feedback (ACTIONABLE - concerns FIRST) - Sub-section: Architectural Improvements Needed (ACTIONABLE) - Sub-section: Testability Assessment Summary (FYI - passing items LAST, only if worth mentioning) 5. Risk Mitigation Plans (detailed for high-priority risks ≥6) - **ACTIONABLE** 6. Assumptions and Dependencies - **FYI** **SECTIONS THAT DO NOT BELONG IN ARCHITECTURE DOC:** - ❌ Test Levels Strategy (unit/integration/E2E split) - This is a RECIPE, belongs in QA doc ONLY - ❌ NFR Testing Approach with test examples - This is a RECIPE, belongs in QA doc ONLY - ❌ Test Environment Requirements - This is a RECIPE, belongs in QA doc ONLY - ❌ Recommendations for Sprint 0 (test framework setup, factories) - This is a RECIPE, belongs in QA doc ONLY - ❌ Quality Gate Criteria (pass rates, coverage targets) - This is a RECIPE, belongs in QA doc ONLY - ❌ Tool Selection (Playwright, k6, etc.) - This is a RECIPE, belongs in QA doc ONLY **WHAT BELONGS IN ARCHITECTURE DOC:** - ✅ Testability CONCERNS (what makes it hard to test) - ✅ Architecture GAPS (what's missing for testability) - ✅ What architecture team must DO (blockers, improvements) - ✅ Risks and mitigation plans - ✅ ASRs (Architecturally Significant Requirements) - but clarify if FYI or actionable **test-design-qa.md sections (in this order):** 1. Executive Summary (risk summary, coverage summary) 2. **Dependencies & Test Blockers** (CRITICAL: RIGHT AFTER SUMMARY - what QA needs from other teams) 3. Risk Assessment (scored risks with categories - reference Arch doc, don't duplicate) 4. Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes) 5. **Execution Strategy** (SIMPLE: Organized by TOOL TYPE: PR (Playwright) / Nightly (k6) / Weekly (chaos/manual)) 6. QA Effort Estimate (QA effort ONLY - no DevOps, Data Eng, Finance, Backend) 7. Appendices (code examples with playwright-utils, tagging strategy, knowledge base refs) **SECTIONS TO EXCLUDE FROM QA DOC:** - ❌ Quality Gate Criteria (pass/fail thresholds - teams decide for themselves) - ❌ Follow-on Workflows (bloat - BMAD commands are self-explanatory) - ❌ Approval section (unnecessary formality) - ❌ Test Environment Requirements (remove as separate section - integrate into Dependencies if needed) - ❌ NFR Readiness Summary (bloat - covered in Risk Assessment) - ❌ Testability Assessment (bloat - covered in Dependencies) - ❌ Test Levels Strategy (bloat - obvious from test scenarios) - ❌ Sprint breakdowns (too prescriptive) - ❌ Infrastructure/DevOps/Data Eng effort tables (out of scope) - ❌ Mitigation plans for non-QA work (belongs in Arch doc) **Content Guidelines:** **Architecture doc (DO):** - ✅ Risk scoring visible (Probability × Impact = Score) - ✅ Clear ownership (each blocker/ASR has owner + timeline) - ✅ Testability requirements (what architecture must support) - ✅ Mitigation plans (for each high-risk item ≥6) - ✅ Brief conceptual examples ONLY if needed to clarify architecture concerns (5-10 lines max) - ✅ **Target length**: ~150-200 lines max (focus on actionable concerns only) - ✅ **Professional tone**: Avoid AI slop (excessive ✅/❌ emojis, "absolutely", "excellent", overly enthusiastic language) **Architecture doc (DON'T) - CRITICAL:** - ❌ NO test scripts or test implementation code AT ALL - This is a communication doc for architects, not a testing guide - ❌ NO Playwright test examples (e.g., test('...', async ({ request }) => ...)) - ❌ NO assertion logic (e.g., expect(...).toBe(...)) - ❌ NO test scenario checklists with checkboxes (belongs in QA doc) - ❌ NO implementation details about HOW QA will test - ❌ Focus on CONCERNS, not IMPLEMENTATION **QA doc (DO):** - ✅ Test scenario recipes (clear P0/P1/P2/P3 with checkboxes) - ✅ Full test implementation code samples when helpful - ✅ **IMPORTANT: If config.tea_use_playwright_utils is true, ALL code samples MUST use @seontechnologies/playwright-utils fixtures and utilities** - ✅ Import test fixtures from '@seontechnologies/playwright-utils/api-request/fixtures' - ✅ Import expect from '@playwright/test' (playwright-utils does not re-export expect) - ✅ Use apiRequest fixture with schema validation, retry logic, and structured responses - ✅ Dependencies & Test Blockers section RIGHT AFTER Executive Summary (what QA needs from other teams) - ✅ **QA effort estimates ONLY** (no DevOps, Data Eng, Finance, Backend effort - out of scope) - ✅ Cross-references to Architecture doc (not duplication) - ✅ **Professional tone**: Avoid AI slop (excessive ✅/❌ emojis, "absolutely", "excellent", overly enthusiastic language) **QA doc (DON'T):** - ❌ NO architectural theory (just reference Architecture doc) - ❌ NO ASR explanations (link to Architecture doc instead) - ❌ NO duplicate risk assessments (reference Architecture doc) - ❌ NO Quality Gate Criteria section (teams decide pass/fail thresholds for themselves) - ❌ NO Follow-on Workflows section (bloat - BMAD commands are self-explanatory) - ❌ NO Approval section (unnecessary formality) - ❌ NO effort estimates for other teams (DevOps, Backend, Data Eng, Finance - out of scope, QA effort only) - ❌ NO Sprint breakdowns (too prescriptive - e.g., "Sprint 0: 40 hours, Sprint 1: 48 hours") - ❌ NO mitigation plans for Backend/Arch/DevOps work (those belong in Architecture doc) - ❌ NO architectural assumptions or debates (those belong in Architecture doc) **Anti-Patterns to Avoid (Cross-Document Redundancy):** **CRITICAL: NO BLOAT, NO REPETITION, NO OVERINFO** ❌ **DON'T duplicate OAuth requirements:** - Architecture doc: Explain OAuth 2.1 flow in detail - QA doc: Re-explain why OAuth 2.1 is required ✅ **DO cross-reference instead:** - Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)" - QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)" ❌ **DON'T repeat the same note 10+ times:** - Example: "Timing is pessimistic until R-005 is fixed" repeated on every P0, P1, P2 section - This creates bloat and makes docs hard to read ✅ **DO consolidate repeated information:** - Write once at the top: "**Note**: All timing estimates are pessimistic pending R-005 resolution" - Reference briefly if needed: "(pessimistic timing)" ❌ **DON'T include excessive detail that doesn't add value:** - Long explanations of obvious concepts - Redundant examples showing the same pattern - Over-documentation of standard practices ✅ **DO focus on what's unique or critical:** - Document only what's different from standard practice - Highlight critical decisions and risks - Keep explanations concise and actionable **Markdown Cross-Reference Syntax Examples:** ```markdown # In test-design-architecture.md ### 🚨 R-001: Multi-Tenant Isolation (Score: 9) **Test Coverage:** 8 P0 tests (see [QA doc - Multi-Tenant Isolation](test-design-qa.md#multi-tenant-isolation-8-tests-security-critical) for detailed scenarios) --- # In test-design-qa.md ## Testability Assessment **Prerequisites from Architecture Doc:** - [ ] R-001: Multi-tenant isolation validated (see [Architecture doc R-001](test-design-architecture.md#r-001-multi-tenant-isolation-score-9) for mitigation plan) - [ ] R-002: Test customer provisioned (see [Architecture doc 🚨 BLOCKERS](test-design-architecture.md#blockers---team-must-decide-cant-proceed-without)) ## Sprint 0 Setup Requirements **Source:** See [Architecture doc "Quick Guide"](test-design-architecture.md#quick-guide) for detailed mitigation plans ``` **Key Points:** - Use relative links: `[Link Text](test-design-qa.md#section-anchor)` - Anchor format: lowercase, hyphens for spaces, remove emojis/special chars - Example anchor: `### 🚨 R-001: Title` → `#r-001-title` ❌ **DON'T put long code examples in Architecture doc:** - Example: 50+ lines of test implementation ✅ **DO keep examples SHORT in Architecture doc:** - Example: 5-10 lines max showing what architecture must support - Full implementation goes in QA doc ❌ **DON'T repeat same note 10+ times:** - Example: "Pessimistic timing until R-005 fixed" on every P0/P1/P2 section ✅ **DO consolidate repeated notes:** - Single timing note at top - Reference briefly throughout: "(pessimistic)" **Write Both Documents:** - Use `test-design-architecture-template.md` for Architecture doc - Use `test-design-qa-template.md` for QA doc - Follow standard structures defined above - Cross-reference between docs (no duplication) - Validate against checklist.md (System-Level Mode section) **Common Over-Engineering to Avoid:** **In QA Doc:** 1. ❌ Quality gate thresholds ("P0 must be 100%, P1 ≥95%") - Let teams decide for themselves 2. ❌ Effort estimates for other teams - QA doc should only estimate QA effort 3. ❌ Sprint breakdowns ("Sprint 0: 40 hours, Sprint 1: 48 hours") - Too prescriptive 4. ❌ Approval sections - Unnecessary formality 5. ❌ Assumptions about architecture (SLO targets, replication lag) - These are architectural concerns, belong in Arch doc 6. ❌ Mitigation plans for Backend/Arch/DevOps - Those belong in Arch doc 7. ❌ Follow-on workflows section - Bloat, BMAD commands are self-explanatory 8. ❌ NFR Readiness Summary - Bloat, covered in Risk Assessment **Test Coverage Numbers Reality Check:** - With Playwright parallelization, running ALL Playwright tests is as fast as running just P0 - Don't split Playwright tests by priority into different CI gates - it adds no value - Tool type matters, not priority labels - Defer based on infrastructure cost, not importance **After System-Level Mode:** Workflow COMPLETE. System-level outputs (test-design-architecture.md + test-design-qa.md) are written in this step. Steps 2-4 are epic-level only - do NOT execute them in system-level mode. --- ## Step 1.6: Exploratory Mode Selection (Epic-Level Only) ### Actions 1. **Detect Planning Mode** Determine mode based on context: **Requirements-Based Mode (DEFAULT)**: - Have clear story/PRD with acceptance criteria - Uses: Existing workflow (Steps 2-4) - Appropriate for: Documented features, greenfield projects **Exploratory Mode (OPTIONAL - Brownfield)**: - Missing/incomplete requirements AND brownfield application exists - Uses: UI exploration to discover functionality - Appropriate for: Undocumented brownfield apps, legacy systems 2. **Requirements-Based Mode (DEFAULT - Skip to Step 2)** If requirements are clear: - Continue with existing workflow (Step 2: Assess and Classify Risks) - Use loaded requirements from Step 1 - Proceed with risk assessment based on documented requirements 3. **Exploratory Mode (OPTIONAL - Brownfield Apps)** If exploring brownfield application: **A. Check MCP Availability** If config.tea_use_mcp_enhancements is true AND Playwright MCP tools available: - Use MCP-assisted exploration (Step 3.B) If MCP unavailable OR config.tea_use_mcp_enhancements is false: - Use manual exploration fallback (Step 3.C) **B. MCP-Assisted Exploration (If MCP Tools Available)** Use Playwright MCP browser tools to explore UI: **Setup:** ``` 1. Use planner_setup_page to initialize browser 2. Navigate to {exploration_url} 3. Capture initial state with browser_snapshot ``` **Exploration Process:** ``` 4. Use browser_navigate to explore different pages 5. Use browser_click to interact with buttons, links, forms 6. Use browser_hover to reveal hidden menus/tooltips 7. Capture browser_snapshot at each significant state 8. Take browser_screenshot for documentation 9. Monitor browser_console_messages for JavaScript errors 10. Track browser_network_requests to identify API calls 11. Map user flows and interactive elements 12. Document discovered functionality ``` **Discovery Documentation:** - Create list of discovered features (pages, workflows, forms) - Identify user journeys (navigation paths) - Map API endpoints (from network requests) - Note error states (from console messages) - Capture screenshots for visual reference **Convert to Test Scenarios:** - Transform discoveries into testable requirements - Prioritize based on user flow criticality - Identify risks from discovered functionality - Continue with Step 2 (Assess and Classify Risks) using discovered requirements **C. Manual Exploration Fallback (If MCP Unavailable)** If Playwright MCP is not available: **Notify User:** ```markdown Exploratory mode enabled but Playwright MCP unavailable. **Manual exploration required:** 1. Open application at: {exploration_url} 2. Explore all pages, workflows, and features 3. Document findings in markdown: - List of pages/features discovered - User journeys identified - API endpoints observed (DevTools Network tab) - JavaScript errors noted (DevTools Console) - Critical workflows mapped 4. Provide exploration findings to continue workflow **Alternative:** Disable exploratory_mode and provide requirements documentation ``` Wait for user to provide exploration findings, then: - Parse user-provided discovery documentation - Convert to testable requirements - Continue with Step 2 (risk assessment) 4. **Proceed to Risk Assessment** After mode selection (Requirements-Based OR Exploratory): - Continue to Step 2: Assess and Classify Risks - Use requirements from documentation (Requirements-Based) OR discoveries (Exploratory) --- ## Step 2: Assess and Classify Risks ### Actions 1. **Identify Genuine Risks** Filter requirements to isolate actual risks (not just features): - Unresolved technical gaps - Security vulnerabilities - Performance bottlenecks - Data loss or corruption potential - Business impact failures - Operational deployment issues 2. **Classify Risks by Category** Use these standard risk categories: **TECH** (Technical/Architecture): - Architecture flaws - Integration failures - Scalability issues - Technical debt **SEC** (Security): - Missing access controls - Authentication bypass - Data exposure - Injection vulnerabilities **PERF** (Performance): - SLA violations - Response time degradation - Resource exhaustion - Scalability limits **DATA** (Data Integrity): - Data loss - Data corruption - Inconsistent state - Migration failures **BUS** (Business Impact): - User experience degradation - Business logic errors - Revenue impact - Compliance violations **OPS** (Operations): - Deployment failures - Configuration errors - Monitoring gaps - Rollback issues 3. **Score Risk Probability** Rate likelihood (1-3): - **1 (Unlikely)**: <10% chance, edge case - **2 (Possible)**: 10-50% chance, known scenario - **3 (Likely)**: >50% chance, common occurrence 4. **Score Risk Impact** Rate severity (1-3): - **1 (Minor)**: Cosmetic, workaround exists, limited users - **2 (Degraded)**: Feature impaired, workaround difficult, affects many users - **3 (Critical)**: System failure, data loss, no workaround, blocks usage 5. **Calculate Risk Score** ``` Risk Score = Probability × Impact Scores: 1-2: Low risk (monitor) 3-4: Medium risk (plan mitigation) 6-9: High risk (immediate mitigation required) ``` 6. **Highlight High-Priority Risks** Flag all risks with score ≥6 for immediate attention. 7. **Request Clarification** If evidence is missing or assumptions required: - Document assumptions clearly - Request user clarification - Do NOT speculate on business impact 8. **Plan Mitigations** **CRITICAL: Mitigation placement depends on WHO does the work** For each high-priority risk: - Define mitigation strategy - Assign owner (dev, QA, ops) - Set timeline - Update residual risk expectation **Mitigation Plan Placement:** **Architecture Doc:** - Mitigations owned by Backend, DevOps, Architecture, Security, Data Eng - Example: "Add authorization layer for customer-scoped access" (Backend work) - Example: "Configure AWS Fault Injection Simulator" (DevOps work) - Example: "Define CloudWatch log schema for backfill events" (Architecture work) **QA Doc:** - Mitigations owned by QA (test development work) - Example: "Create factories for test data with randomization" (QA work) - Example: "Implement polling with retry for async validation" (QA test code) - Brief reference to Architecture doc mitigations (don't duplicate) **Rule of Thumb:** - If mitigation requires production code changes → Architecture doc - If mitigation is test infrastructure/code → QA doc - If mitigation involves multiple teams → Architecture doc with QA validation approach **Assumptions Placement:** **Architecture Doc:** - Architectural assumptions (SLO targets, replication lag, system design assumptions) - Example: "P95 <500ms inferred from <2s timeout (requires Product approval)" - Example: "Multi-region replication lag <1s assumed (ADR doesn't specify SLA)" - Example: "Recent Cache hit ratio >80% assumed (not in PRD/ADR)" **QA Doc:** - Test execution assumptions (test infrastructure readiness, test data availability) - Example: "Assumes test factories already created" - Example: "Assumes CI/CD pipeline configured" - Brief reference to Architecture doc for architectural assumptions **Rule of Thumb:** - If assumption is about system architecture/design → Architecture doc - If assumption is about test infrastructure/execution → QA doc --- ## Step 3: Design Test Coverage ### Actions 1. **Break Down Acceptance Criteria** Convert each acceptance criterion into atomic test scenarios: - One scenario per testable behavior - Scenarios are independent - Scenarios are repeatable - Scenarios tie back to risk mitigations 2. **Select Appropriate Test Levels** **Knowledge Base Reference**: `test-levels-framework.md` Map requirements to optimal test levels (avoid duplication): **E2E (End-to-End)**: - Critical user journeys - Multi-system integration - Production-like environment - Highest confidence, slowest execution **API (Integration)**: - Service contracts - Business logic validation - Fast feedback - Good for complex scenarios **Component**: - UI component behavior - Interaction testing - Visual regression - Fast, isolated **Unit**: - Business logic - Edge cases - Error handling - Fastest, most granular **Avoid duplicate coverage**: Don't test same behavior at multiple levels unless necessary. 3. **Assign Priority Levels** **CRITICAL: P0/P1/P2/P3 indicates priority and risk level, NOT execution timing** **Knowledge Base Reference**: `test-priorities-matrix.md` **P0 (Critical)**: - Blocks core user journey - High-risk areas (score ≥6) - Revenue-impacting - Security-critical - No workaround exists - Affects majority of users **P1 (High)**: - Important user features - Medium-risk areas (score 3-4) - Common workflows - Workaround exists but difficult **P2 (Medium)**: - Secondary features - Low-risk areas (score 1-2) - Edge cases - Regression prevention **P3 (Low)**: - Nice-to-have - Exploratory - Performance benchmarks - Documentation validation **NOTE:** Priority classification is separate from execution timing. A P1 test might run in PRs if it's fast, or nightly if it requires expensive infrastructure (e.g., k6 performance test). See "Execution Strategy" section for timing guidance. 4. **Outline Data and Tooling Prerequisites** For each test scenario, identify: - Test data requirements (factories, fixtures) - External services (mocks, stubs) - Environment setup - Tools and dependencies 5. **Define Execution Strategy** (Keep It Simple) **IMPORTANT: Avoid over-engineering execution order** **Default Philosophy:** - Run **everything** in PRs if total duration <15 minutes - Playwright is fast with parallelization (100s of tests in ~10-15 min) - Only defer to nightly/weekly if there's significant overhead: - Performance tests (k6, load testing) - expensive infrastructure - Chaos engineering - requires special setup (AWS FIS) - Long-running tests - endurance (4+ hours), disaster recovery - Manual tests - require human intervention **Simple Execution Strategy (Organized by TOOL TYPE):** ```markdown ## Execution Strategy **Philosophy**: Run everything in PRs unless significant infrastructure overhead. Playwright with parallelization is extremely fast (100s of tests in ~10-15 min). **Organized by TOOL TYPE:** ### Every PR: Playwright Tests (~10-15 min) All functional tests (from any priority level): - All E2E, API, integration, unit tests using Playwright - Parallelized across {N} shards - Total: ~{N} tests (includes P0, P1, P2, P3) ### Nightly: k6 Performance Tests (~30-60 min) All performance tests (from any priority level): - Load, stress, spike, endurance - Reason: Expensive infrastructure, long-running (10-40 min per test) ### Weekly: Chaos & Long-Running (~hours) Special infrastructure tests (from any priority level): - Multi-region failover, disaster recovery, endurance - Reason: Very expensive, very long (4+ hours) ``` **KEY INSIGHT: Organize by TOOL TYPE, not priority** - Playwright (fast, cheap) → PR - k6 (expensive, long) → Nightly - Chaos/Manual (very expensive, very long) → Weekly **Avoid:** - ❌ Don't organize by priority (smoke → P0 → P1 → P2 → P3) - ❌ Don't say "P1 runs on PR to main" (some P1 are Playwright/PR, some are k6/Nightly) - ❌ Don't create artificial tiers - organize by tool type and infrastructure overhead --- ## Step 4: Generate Deliverables ### Actions 1. **Create Risk Assessment Matrix** Use template structure: ```markdown | Risk ID | Category | Description | Probability | Impact | Score | Mitigation | | ------- | -------- | ----------- | ----------- | ------ | ----- | --------------- | | R-001 | SEC | Auth bypass | 2 | 3 | 6 | Add authz check | ``` 2. **Create Coverage Matrix** ```markdown | Requirement | Test Level | Priority | Risk Link | Test Count | Owner | | ----------- | ---------- | -------- | --------- | ---------- | ----- | | Login flow | E2E | P0 | R-001 | 3 | QA | ``` 3. **Document Execution Strategy** (Simple, Not Redundant) **IMPORTANT: Keep execution strategy simple and avoid redundancy** ```markdown ## Execution Strategy **Default: Run all functional tests in PRs (~10-15 min)** - All Playwright tests (parallelized across 4 shards) - Includes E2E, API, integration, unit tests - Total: ~{N} tests **Nightly: Performance & Infrastructure tests** - k6 load/stress/spike tests (~30-60 min) - Reason: Expensive infrastructure, long-running **Weekly: Chaos & Disaster Recovery** - Endurance tests (4+ hours) - Multi-region failover (requires AWS FIS) - Backup restore validation - Reason: Special infrastructure, very long-running ``` **DO NOT:** - ❌ Create redundant smoke/P0/P1/P2/P3 tier structure - ❌ List all tests again in execution order (already in coverage plan) - ❌ Split tests by priority unless there's infrastructure overhead 4. **Include Resource Estimates** **IMPORTANT: Use intervals/ranges, not exact numbers** Provide rough estimates with intervals to avoid false precision: ```markdown ### Test Effort Estimates - P0 scenarios: 15 tests (~1.5-2.5 hours each) = **~25-40 hours** - P1 scenarios: 25 tests (~0.75-1.5 hours each) = **~20-35 hours** - P2 scenarios: 40 tests (~0.25-0.75 hours each) = **~10-30 hours** - **Total:** **~55-105 hours** (~1.5-3 weeks with 1 QA engineer) ``` **Why intervals:** - Avoids false precision (estimates are never exact) - Provides flexibility for complexity variations - Accounts for unknowns and dependencies - More realistic and less prescriptive **Guidelines:** - P0 tests: 1.5-2.5 hours each (complex setup, security, performance) - P1 tests: 0.75-1.5 hours each (standard integration, API tests) - P2 tests: 0.25-0.75 hours each (edge cases, simple validation) - P3 tests: 0.1-0.5 hours each (exploratory, documentation) **Express totals as:** - Hour ranges: "~55-105 hours" - Week ranges: "~1.5-3 weeks" - Avoid: Exact numbers like "75 hours" or "11 days" 5. **Add Gate Criteria** ```markdown ### Quality Gate Criteria - All P0 tests pass (100%) - P1 tests pass rate ≥95% - No high-risk (score ≥6) items unmitigated - Test coverage ≥80% for critical paths ``` 6. **Write to Output File** Save to `{output_folder}/test-design-epic-{epic_num}.md` using template structure. --- ## Important Notes ### Risk Category Definitions **TECH** (Technical/Architecture): - Architecture flaws or technical debt - Integration complexity - Scalability concerns **SEC** (Security): - Missing security controls - Authentication/authorization gaps - Data exposure risks **PERF** (Performance): - SLA risk or performance degradation - Resource constraints - Scalability bottlenecks **DATA** (Data Integrity): - Data loss or corruption potential - State consistency issues - Migration risks **BUS** (Business Impact): - User experience harm - Business logic errors - Revenue or compliance impact **OPS** (Operations): - Deployment or runtime failures - Configuration issues - Monitoring/observability gaps ### Risk Scoring Methodology **Probability × Impact = Risk Score** Examples: - High likelihood (3) × Critical impact (3) = **Score 9** (highest priority) - Possible (2) × Critical (3) = **Score 6** (high priority threshold) - Unlikely (1) × Minor (1) = **Score 1** (low priority) **Threshold**: Scores ≥6 require immediate mitigation. ### Test Level Selection Strategy **Avoid duplication:** - Don't test same behavior at E2E and API level - Use E2E for critical paths only - Use API tests for complex business logic - Use unit tests for edge cases **Tradeoffs:** - E2E: High confidence, slow execution, brittle - API: Good balance, fast, stable - Unit: Fastest feedback, narrow scope ### Priority Assignment Guidelines **P0 criteria** (all must be true): - Blocks core functionality - High-risk (score ≥6) - No workaround exists - Affects majority of users **P1 criteria**: - Important feature - Medium risk (score 3-5) - Workaround exists but difficult **P2/P3**: Everything else, prioritized by value ### Knowledge Base Integration **Core Fragments (Auto-loaded in Step 1):** - `risk-governance.md` - Risk classification (6 categories), automated scoring, gate decision engine, coverage traceability, owner tracking (625 lines, 4 examples) - `probability-impact.md` - Probability × impact matrix, automated classification thresholds, dynamic re-assessment, gate integration (604 lines, 4 examples) - `test-levels-framework.md` - E2E vs API vs Component vs Unit decision framework with characteristics matrix (467 lines, 4 examples) - `test-priorities-matrix.md` - P0-P3 automated priority calculation, risk-based mapping, tagging strategy, time budgets (389 lines, 2 examples) **Reference for Test Planning:** - `selective-testing.md` - Execution strategy: tag-based, spec filters, diff-based selection, promotion rules (727 lines, 4 examples) - `fixture-architecture.md` - Data setup patterns: pure function → fixture → mergeTests, auto-cleanup (406 lines, 5 examples) **Manual Reference (Optional):** - Use `tea-index.csv` to find additional specialized fragments as needed ### Evidence-Based Assessment **Critical principle:** Base risk assessment on evidence, not speculation. **Evidence sources:** - PRD and user research - Architecture documentation - Historical bug data - User feedback - Security audit results **Avoid:** - Guessing business impact - Assuming user behavior - Inventing requirements **When uncertain:** Document assumptions and request clarification from user. --- ## Output Summary After completing this workflow, provide a summary: ```markdown ## Test Design Complete **Epic**: {epic_num} **Scope**: {design_level} **Risk Assessment**: - Total risks identified: {count} - High-priority risks (≥6): {high_count} - Categories: {categories} **Coverage Plan**: - P0 scenarios: {p0_count} ({p0_hours} hours) - P1 scenarios: {p1_count} ({p1_hours} hours) - P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours) - **Total effort**: {total_hours} hours (~{total_days} days) **Test Levels**: - E2E: {e2e_count} - API: {api_count} - Component: {component_count} - Unit: {unit_count} **Quality Gate Criteria**: - P0 pass rate: 100% - P1 pass rate: ≥95% - High-risk mitigations: 100% - Coverage: ≥80% **Output File**: {output_file} **Next Steps**: 1. Review risk assessment with team 2. Prioritize mitigation for high-risk items (score ≥6) 3. Run `*atdd` to generate failing tests for P0 scenarios (separate workflow; not auto-run by `*test-design`) 4. Allocate resources per effort estimates 5. Set up test data factories and fixtures ``` --- ## Validation After completing all steps, verify: - [ ] Risk assessment complete with all categories - [ ] All risks scored (probability × impact) - [ ] High-priority risks (≥6) flagged - [ ] Coverage matrix maps requirements to test levels - [ ] Priority levels assigned (P0-P3) - [ ] Execution order defined - [ ] Resource estimates provided - [ ] Quality gate criteria defined - [ ] Output file created and formatted correctly Refer to `checklist.md` for comprehensive validation criteria.