BMAD-METHOD/src/bmm/workflows/testarch/test-design/instructions.md

44 KiB
Raw Blame History

Test Design and Risk Assessment

Workflow ID: _bmad/bmm/testarch/test-design Version: 4.0 (BMad v6)


Overview

Plans comprehensive test coverage strategy with risk assessment, priority classification, and execution ordering. This workflow operates in two modes:

  • System-Level Mode (Phase 3): Testability review of architecture before solutioning gate check
  • Epic-Level Mode (Phase 4): Per-epic test planning with risk assessment (current behavior)

The workflow auto-detects which mode to use based on project phase.


Preflight: Detect Mode and Load Context

Critical: Determine mode before proceeding.

Mode Detection (Flexible for Standalone Use)

TEA test-design workflow supports TWO modes, detected automatically:

  1. Check User Intent Explicitly (Priority 1)

    Deterministic Rules:

    • User provided PRD+ADR only (no Epic+Stories) → System-Level Mode
    • User provided Epic+Stories only (no PRD+ADR) → Epic-Level Mode
    • User provided BOTH PRD+ADR AND Epic+StoriesPrefer System-Level Mode (architecture review comes first in Phase 3, then epic planning in Phase 4). If mode preference is unclear, ask user: "Should I create (A) System-level test design (PRD + ADR → Architecture doc + QA doc) or (B) Epic-level test design (Epic → Single test plan)?"
    • If user intent is clear from context, use that mode regardless of file structure
  2. Fallback to File-Based Detection (Priority 2 - BMad-Integrated)

    • Check for {implementation_artifacts}/sprint-status.yaml
    • If exists → Epic-Level Mode (Phase 4, single document output)
    • If NOT exists → System-Level Mode (Phase 3, TWO document outputs)
  3. If Ambiguous, ASK USER (Priority 3)

    • "I see you have [PRD/ADR/Epic/Stories]. Should I create:
      • (A) System-level test design (PRD + ADR → Architecture doc + QA doc)?
      • (B) Epic-level test design (Epic → Single test plan)?"

Mode Descriptions:

System-Level Mode (PRD + ADR Input)

  • When to use: Early in project (Phase 3 Solutioning), architecture being designed
  • Input: PRD, ADR, architecture.md (optional)
  • Output: TWO documents
    • test-design-architecture.md (for Architecture/Dev teams)
    • test-design-qa.md (for QA team)
  • Focus: Testability assessment, ASRs, NFR requirements, Sprint 0 setup

Epic-Level Mode (Epic + Stories Input)

  • When to use: During implementation (Phase 4), per-epic planning
  • Input: Epic, Stories, tech-specs (optional)
  • Output: ONE document
    • test-design-epic-{N}.md (combined risk assessment + test plan)
  • Focus: Risk assessment, coverage plan, execution order, quality gates

Key Insight: TEA Works Standalone OR Integrated

Standalone (No BMad artifacts):

  • User provides PRD + ADR → System-Level Mode
  • User provides Epic description → Epic-Level Mode
  • TEA doesn't mandate full BMad workflow

BMad-Integrated (Full workflow):

  • BMad creates sprint-status.yaml → Automatic Epic-Level detection
  • BMad creates PRD, ADR, architecture.md → Automatic System-Level detection
  • TEA leverages BMad artifacts for richer context

Message to User:

You don't need to follow full BMad methodology to use TEA test-design. Just provide PRD + ADR for system-level, or Epic for epic-level. TEA will auto-detect and produce appropriate documents.

Halt Condition: If mode cannot be determined AND user intent unclear AND required files missing, HALT and notify user:

  • "Please provide either: (A) PRD + ADR for system-level test design, OR (B) Epic + Stories for epic-level test design"

Step 1: Load Context (Mode-Aware)

Mode-Specific Loading:

System-Level Mode (Phase 3)

  1. Read Architecture Documentation

    • Load architecture.md or tech-spec (REQUIRED)
    • Load PRD.md for functional and non-functional requirements
    • Load epics.md for feature scope
    • Identify technology stack decisions (frameworks, databases, deployment targets)
    • Note integration points and external system dependencies
    • Extract NFR requirements (performance SLOs, security requirements, etc.)
  2. Check Playwright Utils Flag

    Read {config_source} and check config.tea_use_playwright_utils.

    If true, note that @seontechnologies/playwright-utils provides utilities for test implementation. Reference in test design where relevant.

  3. Load Knowledge Base Fragments (System-Level)

    Critical: Consult src/bmm/testarch/tea-index.csv to load:

    • adr-quality-readiness-checklist.md - 8-category 29-criteria NFR framework (testability, security, scalability, DR, QoS, deployability, etc.)
    • test-levels-framework.md - Test levels strategy guidance
    • risk-governance.md - Testability risk identification
    • test-quality.md - Quality standards and Definition of Done
  4. Analyze Existing Test Setup (if brownfield)

    • Search for existing test directories
    • Identify current test framework (if any)
    • Note testability concerns in existing codebase

Epic-Level Mode (Phase 4)

  1. Read Requirements Documentation

    • Load PRD.md for high-level product requirements
    • Read epics.md or specific epic for feature scope
    • Read story markdown for detailed acceptance criteria
    • Identify all testable requirements
  2. Load Architecture Context

    • Read architecture.md for system design
    • Read tech-spec for implementation details
    • Read test-design-architecture.md and test-design-qa.md (if exist from Phase 3 system-level test design)
    • Identify technical constraints and dependencies
    • Note integration points and external systems
  3. Analyze Existing Test Coverage

    • Search for existing test files in {test_dir}
    • Identify coverage gaps
    • Note areas with insufficient testing
    • Check for flaky or outdated tests
  4. Load Knowledge Base Fragments (Epic-Level)

    Critical: Consult src/bmm/testarch/tea-index.csv to load:

    • risk-governance.md - Risk classification framework (6 categories: TECH, SEC, PERF, DATA, BUS, OPS), automated scoring, gate decision engine, owner tracking (625 lines, 4 examples)
    • probability-impact.md - Risk scoring methodology (probability × impact matrix, automated classification, dynamic re-assessment, gate integration, 604 lines, 4 examples)
    • test-levels-framework.md - Test level selection guidance (E2E vs API vs Component vs Unit with decision matrix, characteristics, when to use each, 467 lines, 4 examples)
    • test-priorities-matrix.md - P0-P3 prioritization criteria (automated priority calculation, risk-based mapping, tagging strategy, time budgets, 389 lines, 2 examples)

Halt Condition (Epic-Level only): If story data or acceptance criteria are missing, check if brownfield exploration is needed. If neither requirements NOR exploration possible, HALT with message: "Epic-level test design requires clear requirements, acceptance criteria, or brownfield app URL for exploration"


Step 1.5: System-Level Testability Review (Phase 3 Only)

Skip this step if Epic-Level Mode. This step only executes in System-Level Mode.

Actions

  1. Review Architecture for Testability

    STRUCTURE PRINCIPLE: CONCERNS FIRST, PASSING ITEMS LAST

    Evaluate architecture against these criteria and structure output as:

    1. Testability Concerns (ACTIONABLE - what's broken/missing)
    2. Testability Assessment Summary (FYI - what works well)

    Testability Criteria:

    Controllability:

    • Can we control system state for testing? (API seeding, factories, database reset)
    • Are external dependencies mockable? (interfaces, dependency injection)
    • Can we trigger error conditions? (chaos engineering, fault injection)

    Observability:

    • Can we inspect system state? (logging, metrics, traces)
    • Are test results deterministic? (no race conditions, clear success/failure)
    • Can we validate NFRs? (performance metrics, security audit logs)

    Reliability:

    • Are tests isolated? (parallel-safe, stateless, cleanup discipline)
    • Can we reproduce failures? (deterministic waits, HAR capture, seed data)
    • Are components loosely coupled? (mockable, testable boundaries)

    In Architecture Doc Output:

    • Section A: Testability Concerns (TOP) - List what's BROKEN or MISSING
      • Example: "No API for test data seeding → Cannot parallelize tests"
      • Example: "Hardcoded DB connection → Cannot test in CI"
    • Section B: Testability Assessment Summary (BOTTOM) - List what PASSES
      • Example: " API-first design supports test isolation"
      • Only include if worth mentioning; otherwise omit this section entirely
  2. Identify Architecturally Significant Requirements (ASRs)

    CRITICAL: ASRs must indicate if ACTIONABLE or FYI

    From PRD NFRs and architecture decisions, identify quality requirements that:

    • Drive architecture decisions (e.g., "Must handle 10K concurrent users" → caching architecture)
    • Pose testability challenges (e.g., "Sub-second response time" → performance test infrastructure)
    • Require special test environments (e.g., "Multi-region deployment" → regional test instances)

    Score each ASR using risk matrix (probability × impact).

    In Architecture Doc, categorize ASRs:

    • ACTIONABLE ASRs (require architecture changes): Include in "Quick Guide" 🚨 or ⚠️ sections
    • FYI ASRs (already satisfied by architecture): Include in "Quick Guide" 📋 section OR omit if obvious

    Example:

    • ASR-001 (Score 9): "Multi-region deployment requires region-specific test infrastructure" → ACTIONABLE (goes in 🚨 BLOCKERS)
    • ASR-002 (Score 4): "OAuth 2.1 authentication already implemented in ADR-5" → FYI (goes in 📋 INFO ONLY or omit)

    Structure Principle: Actionable ASRs at TOP, FYI ASRs at BOTTOM (or omit)

  3. Define Test Levels Strategy

    IMPORTANT: This section goes in QA doc ONLY, NOT in Architecture doc

    Based on architecture (mobile, web, API, microservices, monolith):

    • Recommend unit/integration/E2E split (e.g., 70/20/10 for API-heavy, 40/30/30 for UI-heavy)
    • Identify test environment needs (local, staging, ephemeral, production-like)
    • Define testing approach per technology (Playwright for web, Maestro for mobile, k6 for performance)

    In Architecture doc: Only mention test level split if it's an ACTIONABLE concern

    • Example: "API response time <100ms requires load testing infrastructure" (concern)
    • DO NOT include full test level strategy table in Architecture doc
  4. Assess NFR Requirements (MINIMAL in Architecture Doc)

    CRITICAL: NFR testing approach is a RECIPE - belongs in QA doc ONLY

    In Architecture Doc:

    • Only mention NFRs if they create testability CONCERNS
    • Focus on WHAT architecture must provide, not HOW to test
    • Keep it brief - 1-2 sentences per NFR category at most

    Example - Security NFR in Architecture doc (if there's a concern): CORRECT (concern-focused, brief, WHAT/WHY only):

    • "System must prevent cross-customer data access (GDPR requirement). Requires test infrastructure for multi-tenant isolation in Sprint 0."
    • "OAuth tokens must expire after 1 hour (ADR-5). Requires test harness for token expiration validation."

    INCORRECT (too detailed, belongs in QA doc):

    • Full table of security test scenarios
    • Test scripts with code examples
    • Detailed test procedures
    • Tool selection (e.g., "use Playwright E2E + OWASP ZAP")
    • Specific test approaches (e.g., "Test approach: Playwright E2E for auth/authz")

    In QA Doc (full NFR testing approach):

    • Security: Full test scenarios, tooling (Playwright + OWASP ZAP), test procedures
    • Performance: Load/stress/spike test scenarios, k6 scripts, SLO thresholds
    • Reliability: Error handling tests, retry logic validation, circuit breaker tests
    • Maintainability: Coverage targets, code quality gates, observability validation

    Rule of Thumb:

    • Architecture doc: "What NFRs exist and what concerns they create" (1-2 sentences)
    • QA doc: "How to test those NFRs" (full sections with tables, code, procedures)
  5. Flag Testability Concerns

    Identify architecture decisions that harm testability:

    • Tight coupling (no interfaces, hard dependencies)
    • No dependency injection (can't mock external services)
    • Hardcoded configurations (can't test different envs)
    • Missing observability (can't validate NFRs)
    • Stateful designs (can't parallelize tests)

    Critical: If testability concerns are blockers (e.g., "Architecture makes performance testing impossible"), document as CONCERNS or FAIL recommendation for gate check.

  6. Output System-Level Test Design (TWO Documents)

    IMPORTANT: System-level mode produces TWO documents instead of one:

    Document 1: test-design-architecture.md (for Architecture/Dev teams)

    • Purpose: Architectural concerns, testability gaps, NFR requirements
    • Audience: Architects, Backend Devs, Frontend Devs, DevOps, Security Engineers
    • Focus: What architecture must deliver for testability
    • Template: test-design-architecture-template.md

    Document 2: test-design-qa.md (for QA team)

    • Purpose: Test execution recipe, coverage plan, Sprint 0 setup
    • Audience: QA Engineers, Test Automation Engineers, QA Leads
    • Focus: How QA will execute tests
    • Template: test-design-qa-template.md

    Standard Structures (REQUIRED):

    test-design-architecture.md sections (in this order):

    STRUCTURE PRINCIPLE: Actionable items FIRST, FYI items LAST

    1. Executive Summary (scope, business context, architecture, risk summary)
    2. Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY)
    3. Risk Assessment (high/medium/low-priority risks with scoring) - ACTIONABLE
    4. Testability Concerns and Architectural Gaps - ACTIONABLE (what arch team must do)
      • Sub-section: Blockers to Fast Feedback (ACTIONABLE - concerns FIRST)
      • Sub-section: Architectural Improvements Needed (ACTIONABLE)
      • Sub-section: Testability Assessment Summary (FYI - passing items LAST, only if worth mentioning)
    5. Risk Mitigation Plans (detailed for high-priority risks ≥6) - ACTIONABLE
    6. Assumptions and Dependencies - FYI

    SECTIONS THAT DO NOT BELONG IN ARCHITECTURE DOC:

    • Test Levels Strategy (unit/integration/E2E split) - This is a RECIPE, belongs in QA doc ONLY
    • NFR Testing Approach with test examples - This is a RECIPE, belongs in QA doc ONLY
    • Test Environment Requirements - This is a RECIPE, belongs in QA doc ONLY
    • Recommendations for Sprint 0 (test framework setup, factories) - This is a RECIPE, belongs in QA doc ONLY
    • Quality Gate Criteria (pass rates, coverage targets) - This is a RECIPE, belongs in QA doc ONLY
    • Tool Selection (Playwright, k6, etc.) - This is a RECIPE, belongs in QA doc ONLY

    WHAT BELONGS IN ARCHITECTURE DOC:

    • Testability CONCERNS (what makes it hard to test)
    • Architecture GAPS (what's missing for testability)
    • What architecture team must DO (blockers, improvements)
    • Risks and mitigation plans
    • ASRs (Architecturally Significant Requirements) - but clarify if FYI or actionable

    test-design-qa.md sections (in this order):

    1. Executive Summary (risk summary, coverage summary)
    2. Dependencies & Test Blockers (CRITICAL: RIGHT AFTER SUMMARY - what QA needs from other teams)
    3. Risk Assessment (scored risks with categories - reference Arch doc, don't duplicate)
    4. Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
    5. Execution Strategy (SIMPLE: Organized by TOOL TYPE: PR (Playwright) / Nightly (k6) / Weekly (chaos/manual))
    6. QA Effort Estimate (QA effort ONLY - no DevOps, Data Eng, Finance, Backend)
    7. Appendices (code examples with playwright-utils, tagging strategy, knowledge base refs)

    SECTIONS TO EXCLUDE FROM QA DOC:

    • Quality Gate Criteria (pass/fail thresholds - teams decide for themselves)
    • Follow-on Workflows (bloat - BMAD commands are self-explanatory)
    • Approval section (unnecessary formality)
    • Test Environment Requirements (remove as separate section - integrate into Dependencies if needed)
    • NFR Readiness Summary (bloat - covered in Risk Assessment)
    • Testability Assessment (bloat - covered in Dependencies)
    • Test Levels Strategy (bloat - obvious from test scenarios)
    • Sprint breakdowns (too prescriptive)
    • Infrastructure/DevOps/Data Eng effort tables (out of scope)
    • Mitigation plans for non-QA work (belongs in Arch doc)

    Content Guidelines:

    Architecture doc (DO):

    • Risk scoring visible (Probability × Impact = Score)
    • Clear ownership (each blocker/ASR has owner + timeline)
    • Testability requirements (what architecture must support)
    • Mitigation plans (for each high-risk item ≥6)
    • Brief conceptual examples ONLY if needed to clarify architecture concerns (5-10 lines max)
    • Target length: ~150-200 lines max (focus on actionable concerns only)
    • Professional tone: Avoid AI slop (excessive / emojis, "absolutely", "excellent", overly enthusiastic language)

    Architecture doc (DON'T) - CRITICAL:

    • NO test scripts or test implementation code AT ALL - This is a communication doc for architects, not a testing guide
    • NO Playwright test examples (e.g., test('...', async ({ request }) => ...))
    • NO assertion logic (e.g., expect(...).toBe(...))
    • NO test scenario checklists with checkboxes (belongs in QA doc)
    • NO implementation details about HOW QA will test
    • Focus on CONCERNS, not IMPLEMENTATION

    QA doc (DO):

    • Test scenario recipes (clear P0/P1/P2/P3 with checkboxes)
    • Full test implementation code samples when helpful
    • IMPORTANT: If config.tea_use_playwright_utils is true, ALL code samples MUST use @seontechnologies/playwright-utils fixtures and utilities
    • Import test fixtures from '@seontechnologies/playwright-utils/api-request/fixtures'
    • Import expect from '@playwright/test' (playwright-utils does not re-export expect)
    • Use apiRequest fixture with schema validation, retry logic, and structured responses
    • Dependencies & Test Blockers section RIGHT AFTER Executive Summary (what QA needs from other teams)
    • QA effort estimates ONLY (no DevOps, Data Eng, Finance, Backend effort - out of scope)
    • Cross-references to Architecture doc (not duplication)
    • Professional tone: Avoid AI slop (excessive / emojis, "absolutely", "excellent", overly enthusiastic language)

    QA doc (DON'T):

    • NO architectural theory (just reference Architecture doc)
    • NO ASR explanations (link to Architecture doc instead)
    • NO duplicate risk assessments (reference Architecture doc)
    • NO Quality Gate Criteria section (teams decide pass/fail thresholds for themselves)
    • NO Follow-on Workflows section (bloat - BMAD commands are self-explanatory)
    • NO Approval section (unnecessary formality)
    • NO effort estimates for other teams (DevOps, Backend, Data Eng, Finance - out of scope, QA effort only)
    • NO Sprint breakdowns (too prescriptive - e.g., "Sprint 0: 40 hours, Sprint 1: 48 hours")
    • NO mitigation plans for Backend/Arch/DevOps work (those belong in Architecture doc)
    • NO architectural assumptions or debates (those belong in Architecture doc)

    Anti-Patterns to Avoid (Cross-Document Redundancy):

    CRITICAL: NO BLOAT, NO REPETITION, NO OVERINFO

    DON'T duplicate OAuth requirements:

    • Architecture doc: Explain OAuth 2.1 flow in detail
    • QA doc: Re-explain why OAuth 2.1 is required

    DO cross-reference instead:

    • Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)"
    • QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)"

    DON'T repeat the same note 10+ times:

    • Example: "Timing is pessimistic until R-005 is fixed" repeated on every P0, P1, P2 section
    • This creates bloat and makes docs hard to read

    DO consolidate repeated information:

    • Write once at the top: "Note: All timing estimates are pessimistic pending R-005 resolution"
    • Reference briefly if needed: "(pessimistic timing)"

    DON'T include excessive detail that doesn't add value:

    • Long explanations of obvious concepts
    • Redundant examples showing the same pattern
    • Over-documentation of standard practices

    DO focus on what's unique or critical:

    • Document only what's different from standard practice
    • Highlight critical decisions and risks
    • Keep explanations concise and actionable

    Markdown Cross-Reference Syntax Examples:

    # In test-design-architecture.md
    
    ### 🚨 R-001: Multi-Tenant Isolation (Score: 9)
    
    **Test Coverage:** 8 P0 tests (see [QA doc - Multi-Tenant Isolation](test-design-qa.md#multi-tenant-isolation-8-tests-security-critical) for detailed scenarios)
    
    ---
    
    # In test-design-qa.md
    
    ## Testability Assessment
    
    **Prerequisites from Architecture Doc:**
    - [ ] R-001: Multi-tenant isolation validated (see [Architecture doc R-001](test-design-architecture.md#r-001-multi-tenant-isolation-score-9) for mitigation plan)
    - [ ] R-002: Test customer provisioned (see [Architecture doc 🚨 BLOCKERS](test-design-architecture.md#blockers---team-must-decide-cant-proceed-without))
    
    ## Sprint 0 Setup Requirements
    
    **Source:** See [Architecture doc "Quick Guide"](test-design-architecture.md#quick-guide) for detailed mitigation plans
    

    Key Points:

    • Use relative links: [Link Text](test-design-qa.md#section-anchor)
    • Anchor format: lowercase, hyphens for spaces, remove emojis/special chars
    • Example anchor: ### 🚨 R-001: Title#r-001-title

    DON'T put long code examples in Architecture doc:

    • Example: 50+ lines of test implementation

    DO keep examples SHORT in Architecture doc:

    • Example: 5-10 lines max showing what architecture must support
    • Full implementation goes in QA doc

    DON'T repeat same note 10+ times:

    • Example: "Pessimistic timing until R-005 fixed" on every P0/P1/P2 section

    DO consolidate repeated notes:

    • Single timing note at top
    • Reference briefly throughout: "(pessimistic)"

    Write Both Documents:

    • Use test-design-architecture-template.md for Architecture doc
    • Use test-design-qa-template.md for QA doc
    • Follow standard structures defined above
    • Cross-reference between docs (no duplication)
    • Validate against checklist.md (System-Level Mode section)

Common Over-Engineering to Avoid:

In QA Doc:

  1. Quality gate thresholds ("P0 must be 100%, P1 ≥95%") - Let teams decide for themselves
  2. Effort estimates for other teams - QA doc should only estimate QA effort
  3. Sprint breakdowns ("Sprint 0: 40 hours, Sprint 1: 48 hours") - Too prescriptive
  4. Approval sections - Unnecessary formality
  5. Assumptions about architecture (SLO targets, replication lag) - These are architectural concerns, belong in Arch doc
  6. Mitigation plans for Backend/Arch/DevOps - Those belong in Arch doc
  7. Follow-on workflows section - Bloat, BMAD commands are self-explanatory
  8. NFR Readiness Summary - Bloat, covered in Risk Assessment

Test Coverage Numbers Reality Check:

  • With Playwright parallelization, running ALL Playwright tests is as fast as running just P0
  • Don't split Playwright tests by priority into different CI gates - it adds no value
  • Tool type matters, not priority labels
  • Defer based on infrastructure cost, not importance

After System-Level Mode: Workflow COMPLETE. System-level outputs (test-design-architecture.md + test-design-qa.md) are written in this step. Steps 2-4 are epic-level only - do NOT execute them in system-level mode.


Step 1.6: Exploratory Mode Selection (Epic-Level Only)

Actions

  1. Detect Planning Mode

    Determine mode based on context:

    Requirements-Based Mode (DEFAULT):

    • Have clear story/PRD with acceptance criteria
    • Uses: Existing workflow (Steps 2-4)
    • Appropriate for: Documented features, greenfield projects

    Exploratory Mode (OPTIONAL - Brownfield):

    • Missing/incomplete requirements AND brownfield application exists
    • Uses: UI exploration to discover functionality
    • Appropriate for: Undocumented brownfield apps, legacy systems
  2. Requirements-Based Mode (DEFAULT - Skip to Step 2)

    If requirements are clear:

    • Continue with existing workflow (Step 2: Assess and Classify Risks)
    • Use loaded requirements from Step 1
    • Proceed with risk assessment based on documented requirements
  3. Exploratory Mode (OPTIONAL - Brownfield Apps)

    If exploring brownfield application:

    A. Check MCP Availability

    If config.tea_use_mcp_enhancements is true AND Playwright MCP tools available:

    • Use MCP-assisted exploration (Step 3.B)

    If MCP unavailable OR config.tea_use_mcp_enhancements is false:

    • Use manual exploration fallback (Step 3.C)

    B. MCP-Assisted Exploration (If MCP Tools Available)

    Use Playwright MCP browser tools to explore UI:

    Setup:

    1. Use planner_setup_page to initialize browser
    2. Navigate to {exploration_url}
    3. Capture initial state with browser_snapshot
    

    Exploration Process:

    4. Use browser_navigate to explore different pages
    5. Use browser_click to interact with buttons, links, forms
    6. Use browser_hover to reveal hidden menus/tooltips
    7. Capture browser_snapshot at each significant state
    8. Take browser_screenshot for documentation
    9. Monitor browser_console_messages for JavaScript errors
    10. Track browser_network_requests to identify API calls
    11. Map user flows and interactive elements
    12. Document discovered functionality
    

    Discovery Documentation:

    • Create list of discovered features (pages, workflows, forms)
    • Identify user journeys (navigation paths)
    • Map API endpoints (from network requests)
    • Note error states (from console messages)
    • Capture screenshots for visual reference

    Convert to Test Scenarios:

    • Transform discoveries into testable requirements
    • Prioritize based on user flow criticality
    • Identify risks from discovered functionality
    • Continue with Step 2 (Assess and Classify Risks) using discovered requirements

    C. Manual Exploration Fallback (If MCP Unavailable)

    If Playwright MCP is not available:

    Notify User:

    Exploratory mode enabled but Playwright MCP unavailable.
    
    **Manual exploration required:**
    
    1. Open application at: {exploration_url}
    2. Explore all pages, workflows, and features
    3. Document findings in markdown:
       - List of pages/features discovered
       - User journeys identified
       - API endpoints observed (DevTools Network tab)
       - JavaScript errors noted (DevTools Console)
       - Critical workflows mapped
    
    4. Provide exploration findings to continue workflow
    
    **Alternative:** Disable exploratory_mode and provide requirements documentation
    

    Wait for user to provide exploration findings, then:

    • Parse user-provided discovery documentation
    • Convert to testable requirements
    • Continue with Step 2 (risk assessment)
  4. Proceed to Risk Assessment

    After mode selection (Requirements-Based OR Exploratory):

    • Continue to Step 2: Assess and Classify Risks
    • Use requirements from documentation (Requirements-Based) OR discoveries (Exploratory)

Step 2: Assess and Classify Risks

Actions

  1. Identify Genuine Risks

    Filter requirements to isolate actual risks (not just features):

    • Unresolved technical gaps
    • Security vulnerabilities
    • Performance bottlenecks
    • Data loss or corruption potential
    • Business impact failures
    • Operational deployment issues
  2. Classify Risks by Category

    Use these standard risk categories:

    TECH (Technical/Architecture):

    • Architecture flaws
    • Integration failures
    • Scalability issues
    • Technical debt

    SEC (Security):

    • Missing access controls
    • Authentication bypass
    • Data exposure
    • Injection vulnerabilities

    PERF (Performance):

    • SLA violations
    • Response time degradation
    • Resource exhaustion
    • Scalability limits

    DATA (Data Integrity):

    • Data loss
    • Data corruption
    • Inconsistent state
    • Migration failures

    BUS (Business Impact):

    • User experience degradation
    • Business logic errors
    • Revenue impact
    • Compliance violations

    OPS (Operations):

    • Deployment failures
    • Configuration errors
    • Monitoring gaps
    • Rollback issues
  3. Score Risk Probability

    Rate likelihood (1-3):

    • 1 (Unlikely): <10% chance, edge case
    • 2 (Possible): 10-50% chance, known scenario
    • 3 (Likely): >50% chance, common occurrence
  4. Score Risk Impact

    Rate severity (1-3):

    • 1 (Minor): Cosmetic, workaround exists, limited users
    • 2 (Degraded): Feature impaired, workaround difficult, affects many users
    • 3 (Critical): System failure, data loss, no workaround, blocks usage
  5. Calculate Risk Score

    Risk Score = Probability × Impact
    
    Scores:
    1-2: Low risk (monitor)
    3-4: Medium risk (plan mitigation)
    6-9: High risk (immediate mitigation required)
    
  6. Highlight High-Priority Risks

    Flag all risks with score ≥6 for immediate attention.

  7. Request Clarification

    If evidence is missing or assumptions required:

    • Document assumptions clearly
    • Request user clarification
    • Do NOT speculate on business impact
  8. Plan Mitigations

    CRITICAL: Mitigation placement depends on WHO does the work

    For each high-priority risk:

    • Define mitigation strategy
    • Assign owner (dev, QA, ops)
    • Set timeline
    • Update residual risk expectation

    Mitigation Plan Placement:

    Architecture Doc:

    • Mitigations owned by Backend, DevOps, Architecture, Security, Data Eng
    • Example: "Add authorization layer for customer-scoped access" (Backend work)
    • Example: "Configure AWS Fault Injection Simulator" (DevOps work)
    • Example: "Define CloudWatch log schema for backfill events" (Architecture work)

    QA Doc:

    • Mitigations owned by QA (test development work)
    • Example: "Create factories for test data with randomization" (QA work)
    • Example: "Implement polling with retry for async validation" (QA test code)
    • Brief reference to Architecture doc mitigations (don't duplicate)

    Rule of Thumb:

    • If mitigation requires production code changes → Architecture doc
    • If mitigation is test infrastructure/code → QA doc
    • If mitigation involves multiple teams → Architecture doc with QA validation approach

    Assumptions Placement:

    Architecture Doc:

    • Architectural assumptions (SLO targets, replication lag, system design assumptions)
    • Example: "P95 <500ms inferred from <2s timeout (requires Product approval)"
    • Example: "Multi-region replication lag <1s assumed (ADR doesn't specify SLA)"
    • Example: "Recent Cache hit ratio >80% assumed (not in PRD/ADR)"

    QA Doc:

    • Test execution assumptions (test infrastructure readiness, test data availability)
    • Example: "Assumes test factories already created"
    • Example: "Assumes CI/CD pipeline configured"
    • Brief reference to Architecture doc for architectural assumptions

    Rule of Thumb:

    • If assumption is about system architecture/design → Architecture doc
    • If assumption is about test infrastructure/execution → QA doc

Step 3: Design Test Coverage

Actions

  1. Break Down Acceptance Criteria

    Convert each acceptance criterion into atomic test scenarios:

    • One scenario per testable behavior
    • Scenarios are independent
    • Scenarios are repeatable
    • Scenarios tie back to risk mitigations
  2. Select Appropriate Test Levels

    Knowledge Base Reference: test-levels-framework.md

    Map requirements to optimal test levels (avoid duplication):

    E2E (End-to-End):

    • Critical user journeys
    • Multi-system integration
    • Production-like environment
    • Highest confidence, slowest execution

    API (Integration):

    • Service contracts
    • Business logic validation
    • Fast feedback
    • Good for complex scenarios

    Component:

    • UI component behavior
    • Interaction testing
    • Visual regression
    • Fast, isolated

    Unit:

    • Business logic
    • Edge cases
    • Error handling
    • Fastest, most granular

    Avoid duplicate coverage: Don't test same behavior at multiple levels unless necessary.

  3. Assign Priority Levels

    CRITICAL: P0/P1/P2/P3 indicates priority and risk level, NOT execution timing

    Knowledge Base Reference: test-priorities-matrix.md

    P0 (Critical):

    • Blocks core user journey
    • High-risk areas (score ≥6)
    • Revenue-impacting
    • Security-critical
    • No workaround exists
    • Affects majority of users

    P1 (High):

    • Important user features
    • Medium-risk areas (score 3-4)
    • Common workflows
    • Workaround exists but difficult

    P2 (Medium):

    • Secondary features
    • Low-risk areas (score 1-2)
    • Edge cases
    • Regression prevention

    P3 (Low):

    • Nice-to-have
    • Exploratory
    • Performance benchmarks
    • Documentation validation

    NOTE: Priority classification is separate from execution timing. A P1 test might run in PRs if it's fast, or nightly if it requires expensive infrastructure (e.g., k6 performance test). See "Execution Strategy" section for timing guidance.

  4. Outline Data and Tooling Prerequisites

    For each test scenario, identify:

    • Test data requirements (factories, fixtures)
    • External services (mocks, stubs)
    • Environment setup
    • Tools and dependencies
  5. Define Execution Strategy (Keep It Simple)

    IMPORTANT: Avoid over-engineering execution order

    Default Philosophy:

    • Run everything in PRs if total duration <15 minutes
    • Playwright is fast with parallelization (100s of tests in ~10-15 min)
    • Only defer to nightly/weekly if there's significant overhead:
      • Performance tests (k6, load testing) - expensive infrastructure
      • Chaos engineering - requires special setup (AWS FIS)
      • Long-running tests - endurance (4+ hours), disaster recovery
      • Manual tests - require human intervention

    Simple Execution Strategy (Organized by TOOL TYPE):

    ## Execution Strategy
    
    **Philosophy**: Run everything in PRs unless significant infrastructure overhead.
    Playwright with parallelization is extremely fast (100s of tests in ~10-15 min).
    
    **Organized by TOOL TYPE:**
    
    ### Every PR: Playwright Tests (~10-15 min)
    All functional tests (from any priority level):
    - All E2E, API, integration, unit tests using Playwright
    - Parallelized across {N} shards
    - Total: ~{N} tests (includes P0, P1, P2, P3)
    
    ### Nightly: k6 Performance Tests (~30-60 min)
    All performance tests (from any priority level):
    - Load, stress, spike, endurance
    - Reason: Expensive infrastructure, long-running (10-40 min per test)
    
    ### Weekly: Chaos & Long-Running (~hours)
    Special infrastructure tests (from any priority level):
    - Multi-region failover, disaster recovery, endurance
    - Reason: Very expensive, very long (4+ hours)
    

    KEY INSIGHT: Organize by TOOL TYPE, not priority

    • Playwright (fast, cheap) → PR
    • k6 (expensive, long) → Nightly
    • Chaos/Manual (very expensive, very long) → Weekly

    Avoid:

    • Don't organize by priority (smoke → P0 → P1 → P2 → P3)
    • Don't say "P1 runs on PR to main" (some P1 are Playwright/PR, some are k6/Nightly)
    • Don't create artificial tiers - organize by tool type and infrastructure overhead

Step 4: Generate Deliverables

Actions

  1. Create Risk Assessment Matrix

    Use template structure:

    | Risk ID | Category | Description | Probability | Impact | Score | Mitigation      |
    | ------- | -------- | ----------- | ----------- | ------ | ----- | --------------- |
    | R-001   | SEC      | Auth bypass | 2           | 3      | 6     | Add authz check |
    
  2. Create Coverage Matrix

    | Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
    | ----------- | ---------- | -------- | --------- | ---------- | ----- |
    | Login flow  | E2E        | P0       | R-001     | 3          | QA    |
    
  3. Document Execution Strategy (Simple, Not Redundant)

    IMPORTANT: Keep execution strategy simple and avoid redundancy

    ## Execution Strategy
    
    **Default: Run all functional tests in PRs (~10-15 min)**
    - All Playwright tests (parallelized across 4 shards)
    - Includes E2E, API, integration, unit tests
    - Total: ~{N} tests
    
    **Nightly: Performance & Infrastructure tests**
    - k6 load/stress/spike tests (~30-60 min)
    - Reason: Expensive infrastructure, long-running
    
    **Weekly: Chaos & Disaster Recovery**
    - Endurance tests (4+ hours)
    - Multi-region failover (requires AWS FIS)
    - Backup restore validation
    - Reason: Special infrastructure, very long-running
    

    DO NOT:

    • Create redundant smoke/P0/P1/P2/P3 tier structure
    • List all tests again in execution order (already in coverage plan)
    • Split tests by priority unless there's infrastructure overhead
  4. Include Resource Estimates

    IMPORTANT: Use intervals/ranges, not exact numbers

    Provide rough estimates with intervals to avoid false precision:

    ### Test Effort Estimates
    
    - P0 scenarios: 15 tests (~1.5-2.5 hours each) = **~25-40 hours**
    - P1 scenarios: 25 tests (~0.75-1.5 hours each) = **~20-35 hours**
    - P2 scenarios: 40 tests (~0.25-0.75 hours each) = **~10-30 hours**
    - **Total:** **~55-105 hours** (~1.5-3 weeks with 1 QA engineer)
    

    Why intervals:

    • Avoids false precision (estimates are never exact)
    • Provides flexibility for complexity variations
    • Accounts for unknowns and dependencies
    • More realistic and less prescriptive

    Guidelines:

    • P0 tests: 1.5-2.5 hours each (complex setup, security, performance)
    • P1 tests: 0.75-1.5 hours each (standard integration, API tests)
    • P2 tests: 0.25-0.75 hours each (edge cases, simple validation)
    • P3 tests: 0.1-0.5 hours each (exploratory, documentation)

    Express totals as:

    • Hour ranges: "~55-105 hours"
    • Week ranges: "~1.5-3 weeks"
    • Avoid: Exact numbers like "75 hours" or "11 days"
  5. Add Gate Criteria

    ### Quality Gate Criteria
    
    - All P0 tests pass (100%)
    - P1 tests pass rate ≥95%
    - No high-risk (score ≥6) items unmitigated
    - Test coverage ≥80% for critical paths
    
  6. Write to Output File

    Save to {output_folder}/test-design-epic-{epic_num}.md using template structure.


Important Notes

Risk Category Definitions

TECH (Technical/Architecture):

  • Architecture flaws or technical debt
  • Integration complexity
  • Scalability concerns

SEC (Security):

  • Missing security controls
  • Authentication/authorization gaps
  • Data exposure risks

PERF (Performance):

  • SLA risk or performance degradation
  • Resource constraints
  • Scalability bottlenecks

DATA (Data Integrity):

  • Data loss or corruption potential
  • State consistency issues
  • Migration risks

BUS (Business Impact):

  • User experience harm
  • Business logic errors
  • Revenue or compliance impact

OPS (Operations):

  • Deployment or runtime failures
  • Configuration issues
  • Monitoring/observability gaps

Risk Scoring Methodology

Probability × Impact = Risk Score

Examples:

  • High likelihood (3) × Critical impact (3) = Score 9 (highest priority)
  • Possible (2) × Critical (3) = Score 6 (high priority threshold)
  • Unlikely (1) × Minor (1) = Score 1 (low priority)

Threshold: Scores ≥6 require immediate mitigation.

Test Level Selection Strategy

Avoid duplication:

  • Don't test same behavior at E2E and API level
  • Use E2E for critical paths only
  • Use API tests for complex business logic
  • Use unit tests for edge cases

Tradeoffs:

  • E2E: High confidence, slow execution, brittle
  • API: Good balance, fast, stable
  • Unit: Fastest feedback, narrow scope

Priority Assignment Guidelines

P0 criteria (all must be true):

  • Blocks core functionality
  • High-risk (score ≥6)
  • No workaround exists
  • Affects majority of users

P1 criteria:

  • Important feature
  • Medium risk (score 3-5)
  • Workaround exists but difficult

P2/P3: Everything else, prioritized by value

Knowledge Base Integration

Core Fragments (Auto-loaded in Step 1):

  • risk-governance.md - Risk classification (6 categories), automated scoring, gate decision engine, coverage traceability, owner tracking (625 lines, 4 examples)
  • probability-impact.md - Probability × impact matrix, automated classification thresholds, dynamic re-assessment, gate integration (604 lines, 4 examples)
  • test-levels-framework.md - E2E vs API vs Component vs Unit decision framework with characteristics matrix (467 lines, 4 examples)
  • test-priorities-matrix.md - P0-P3 automated priority calculation, risk-based mapping, tagging strategy, time budgets (389 lines, 2 examples)

Reference for Test Planning:

  • selective-testing.md - Execution strategy: tag-based, spec filters, diff-based selection, promotion rules (727 lines, 4 examples)
  • fixture-architecture.md - Data setup patterns: pure function → fixture → mergeTests, auto-cleanup (406 lines, 5 examples)

Manual Reference (Optional):

  • Use tea-index.csv to find additional specialized fragments as needed

Evidence-Based Assessment

Critical principle: Base risk assessment on evidence, not speculation.

Evidence sources:

  • PRD and user research
  • Architecture documentation
  • Historical bug data
  • User feedback
  • Security audit results

Avoid:

  • Guessing business impact
  • Assuming user behavior
  • Inventing requirements

When uncertain: Document assumptions and request clarification from user.


Output Summary

After completing this workflow, provide a summary:

## Test Design Complete

**Epic**: {epic_num}
**Scope**: {design_level}

**Risk Assessment**:

- Total risks identified: {count}
- High-priority risks (≥6): {high_count}
- Categories: {categories}

**Coverage Plan**:

- P0 scenarios: {p0_count} ({p0_hours} hours)
- P1 scenarios: {p1_count} ({p1_hours} hours)
- P2/P3 scenarios: {p2p3_count} ({p2p3_hours} hours)
- **Total effort**: {total_hours} hours (~{total_days} days)

**Test Levels**:

- E2E: {e2e_count}
- API: {api_count}
- Component: {component_count}
- Unit: {unit_count}

**Quality Gate Criteria**:

- P0 pass rate: 100%
- P1 pass rate: ≥95%
- High-risk mitigations: 100%
- Coverage: ≥80%

**Output File**: {output_file}

**Next Steps**:

1. Review risk assessment with team
2. Prioritize mitigation for high-risk items (score ≥6)
3. Run `*atdd` to generate failing tests for P0 scenarios (separate workflow; not auto-run by `*test-design`)
4. Allocate resources per effort estimates
5. Set up test data factories and fixtures

Validation

After completing all steps, verify:

  • Risk assessment complete with all categories
  • All risks scored (probability × impact)
  • High-priority risks (≥6) flagged
  • Coverage matrix maps requirements to test levels
  • Priority levels assigned (P0-P3)
  • Execution order defined
  • Resource estimates provided
  • Quality gate criteria defined
  • Output file created and formatted correctly

Refer to checklist.md for comprehensive validation criteria.