Merge f33457ee4c into 48881f86a6

doc: test design refinements (#1382 )
feat: add optional style_guide input to editorial review tasks
2026-01-23 20:04:08 -03:00 · 2026-01-23 13:00:48 -06:00 · 2026-01-22 21:24:00 -08:00
6 changed files with 708 additions and 368 deletions
--- a/src/bmm/workflows/testarch/test-design/checklist.md
+++ b/src/bmm/workflows/testarch/test-design/checklist.md
@ -80,23 +80,29 @@
 - [ ] Owners assigned where applicable
 - [ ] No duplicate coverage (same behavior at multiple levels)
-### Execution Order
+### Execution Strategy
- [ ] Smoke tests defined (<5 min target)
+**CRITICAL: Keep execution strategy simple, avoid redundancy**
- [ ] P0 tests listed (<10 min target)
+
- [ ] P1 tests listed (<30 min target)
+- [ ] **Simple structure**: PR / Nightly / Weekly (NOT complex smoke/P0/P1/P2 tiers)
- [ ] P2/P3 tests listed (<60 min target)
+- [ ] **PR execution**: All functional tests unless significant infrastructure overhead
- [ ] Order optimizes for fast feedback
+- [ ] **Nightly/Weekly**: Only performance, chaos, long-running, manual tests
 - [ ] **No redundancy**: Don't re-list all tests (already in coverage plan)
 - [ ] **Philosophy stated**: "Run everything in PRs if <15 min, defer only if expensive/long"
 - [ ] **Playwright parallelization noted**: 100s of tests in 10-15 min
 ### Resource Estimates
- [ ] P0 hours calculated (count × 2 hours)
+**CRITICAL: Use intervals/ranges, NOT exact numbers**
- [ ] P1 hours calculated (count × 1 hour)
+
- [ ] P2 hours calculated (count × 0.5 hours)
+- [ ] P0 effort provided as interval range (e.g., "~25-40 hours" NOT "36 hours")
- [ ] P3 hours calculated (count × 0.25 hours)
+- [ ] P1 effort provided as interval range (e.g., "~20-35 hours" NOT "27 hours")
- [ ] Total hours summed
+- [ ] P2 effort provided as interval range (e.g., "~10-30 hours" NOT "15.5 hours")
- [ ] Days estimate provided (hours / 8)
+- [ ] P3 effort provided as interval range (e.g., "~2-5 hours" NOT "2.5 hours")
- [ ] Estimates include setup time
+- [ ] Total effort provided as interval range (e.g., "~55-110 hours" NOT "81 hours")
 - [ ] Timeline provided as week range (e.g., "~1.5-3 weeks" NOT "11 days")
 - [ ] Estimates include setup time and account for complexity variations
 - [ ] **No false precision**: Avoid exact calculations like "18 tests × 2 hours = 36 hours"
 ### Quality Gate Criteria
@ -126,11 +132,16 @@
 ### Priority Assignment Accuracy
- [ ] P0: Truly blocks core functionality
+**CRITICAL: Priority classification is separate from execution timing**
- [ ] P0: High-risk (score ≥6)
+
- [ ] P0: No workaround exists
+- [ ] **Priority sections (P0/P1/P2/P3) do NOT include execution context** (e.g., no "Run on every commit" in headers)
- [ ] P1: Important but not blocking
+- [ ] **Priority sections have only "Criteria" and "Purpose"** (no "Execution:" field)
- [ ] P2/P3: Nice-to-have or edge cases
+- [ ] **Execution Strategy section** is separate and handles timing based on infrastructure overhead
 - [ ] P0: Truly blocks core functionality + High-risk (≥6) + No workaround
 - [ ] P1: Important features + Medium-risk (3-4) + Common workflows
 - [ ] P2: Secondary features + Low-risk (1-2) + Edge cases
 - [ ] P3: Nice-to-have + Exploratory + Benchmarks
 - [ ] **Note at top of Test Coverage Plan**: Clarifies P0/P1/P2/P3 = priority/risk, NOT execution timing
 ### Test Level Selection
@ -176,58 +187,90 @@
  - [ ] 🚨 BLOCKERS - Team Must Decide (Sprint 0 critical path items)
  - [ ] ⚠️ HIGH PRIORITY - Team Should Validate (recommendations for approval)
  - [ ] 📋 INFO ONLY - Solutions Provided (no decisions needed)
- [ ] **Risk Assessment** section
+- [ ] **Risk Assessment** section - **ACTIONABLE**
  - [ ] Total risks identified count
  - [ ] High-priority risks table (score ≥6) with all columns: Risk ID, Category, Description, Probability, Impact, Score, Mitigation, Owner, Timeline
  - [ ] Medium and low-priority risks tables
  - [ ] Risk category legend included
- [ ] **Testability Concerns** section (if system has architectural constraints)
+- [ ] **Testability Concerns and Architectural Gaps** section - **ACTIONABLE**
-  - [ ] Blockers to fast feedback table
+  - [ ] **Sub-section: 🚨 ACTIONABLE CONCERNS** at TOP
-  - [ ] Explanation of why standard CI/CD may not apply (if applicable)
+    - [ ] Blockers to Fast Feedback table (WHAT architecture must provide)
-  - [ ] Tiered testing strategy table (if forced by architecture)
+    - [ ] Architectural Improvements Needed (WHAT must be changed)
-  - [ ] Architectural improvements needed (or acknowledgment system supports testing well)
+    - [ ] Each concern has: Owner, Timeline, Impact
  - [ ] **Sub-section: Testability Assessment Summary** at BOTTOM (FYI)
    - [ ] What Works Well (passing items)
    - [ ] Accepted Trade-offs (no action required)
    - [ ] This section only included if worth mentioning; otherwise omitted
 - [ ] **Risk Mitigation Plans** for all high-priority risks (≥6)
  - [ ] Each plan has: Strategy (numbered steps), Owner, Timeline, Status, Verification
  - [ ] **Only Backend/DevOps/Arch/Security mitigations** (production code changes)
  - [ ] QA-owned mitigations belong in QA doc instead
 - [ ] **Assumptions and Dependencies** section
  - [ ] **Architectural assumptions only** (SLO targets, replication lag, system design)
  - [ ] Assumptions list (numbered)
  - [ ] Dependencies list with required dates
  - [ ] Risks to plan with impact and contingency
  - [ ] QA execution assumptions belong in QA doc instead
 - [ ] **NO test implementation code** (long examples belong in QA doc)
 - [ ] **NO test scripts** (no Playwright test(...) blocks, no assertions, no test setup code)
 - [ ] **NO NFR test examples** (NFR sections describe WHAT to test, not HOW to test)
 - [ ] **NO test scenario checklists** (belong in QA doc)
- [ ] **Cross-references to QA doc** where appropriate
+- [ ] **NO bloat or repetition** (consolidate repeated notes, avoid over-explanation)
 - [ ] **Cross-references to QA doc** where appropriate (instead of duplication)
 - [ ] **RECIPE SECTIONS NOT IN ARCHITECTURE DOC:**
  - [ ] NO "Test Levels Strategy" section (unit/integration/E2E split belongs in QA doc only)
  - [ ] NO "NFR Testing Approach" section with detailed test procedures (belongs in QA doc only)
  - [ ] NO "Test Environment Requirements" section (belongs in QA doc only)
  - [ ] NO "Recommendations for Sprint 0" section with test framework setup (belongs in QA doc only)
  - [ ] NO "Quality Gate Criteria" section (pass rates, coverage targets belong in QA doc only)
  - [ ] NO "Tool Selection" section (Playwright, k6, etc. belongs in QA doc only)
 ### test-design-qa.md
- [ ] **Purpose statement** at top (execution recipe for QA team)
+**NEW STRUCTURE (streamlined from 375 to ~287 lines):**
- [ ] **Quick Reference for QA** section
+
-  - [ ] Before You Start checklist
+- [ ] **Purpose statement** at top (test execution recipe)
-  - [ ] Test Execution Order
+- [ ] **Executive Summary** with risk summary and coverage summary
-  - [ ] Need Help? guidance
+- [ ] **Dependencies & Test Blockers** section in POSITION 2 (right after Executive Summary)
- [ ] **System Architecture Summary** (brief overview of services and data flow)
+  - [ ] Backend/Architecture dependencies listed (what QA needs from other teams)
- [ ] **Test Environment Requirements** in early section (section 1-3, NOT buried at end)
+  - [ ] QA infrastructure setup listed (factories, fixtures, environments)
-  - [ ] Table with Local/Dev/Staging environments
+  - [ ] Code example with playwright-utils if config.tea_use_playwright_utils is true
-  - [ ] Key principles listed (shared DB, randomization, parallel-safe, self-cleaning, shift-left)
+  - [ ] Test from '@seontechnologies/playwright-utils/api-request/fixtures'
-  - [ ] Code example provided
+  - [ ] Expect from '@playwright/test' (playwright-utils does not re-export expect)
- [ ] **Testability Assessment** with prerequisites checklist
+  - [ ] Code examples include assertions (no unused imports)
-  - [ ] References Architecture doc blockers (not duplication)
+- [ ] **Risk Assessment** section (brief, references Architecture doc)
- [ ] **Test Levels Strategy** with unit/integration/E2E split
+  - [ ] High-priority risks table
-  - [ ] System type identified
+  - [ ] Medium/low-priority risks table
-  - [ ] Recommended split percentages with rationale
+  - [ ] Each risk shows "QA Test Coverage" column (how QA validates)
  - [ ] Test count summary (P0/P1/P2/P3 totals)
 - [ ] **Test Coverage Plan** with P0/P1/P2/P3 sections
-  - [ ] Each priority has: Execution details, Purpose, Criteria, Test Count
+  - [ ] Priority sections have ONLY "Criteria" (no execution context)
-  - [ ] Detailed test scenarios WITH CHECKBOXES
+  - [ ] Note at top: "P0/P1/P2/P3 = priority, NOT execution timing"
-  - [ ] Coverage table with columns: Requirement | Test Level | Risk Link | Test Count | Owner | Notes
+  - [ ] Test tables with columns: Test ID | Requirement | Test Level | Risk Link | Notes
- [ ] **Sprint 0 Setup Requirements**
+- [ ] **Execution Strategy** section (organized by TOOL TYPE)
-  - [ ] Architecture/Backend blockers listed with cross-references to Architecture doc
+  - [ ] Every PR: Playwright tests (~10-15 min)
-  - [ ] QA Test Infrastructure section (factories, fixtures)
+  - [ ] Nightly: k6 performance tests (~30-60 min)
-  - [ ] Test Environments section (Local, CI/CD, Staging, Production)
+  - [ ] Weekly: Chaos & long-running (~hours)
-  - [ ] Sprint 0 NFR Gates checklist
+  - [ ] Philosophy: "Run everything in PRs unless expensive/long-running"
-  - [ ] Sprint 1 Items clearly separated
+- [ ] **QA Effort Estimate** section (QA effort ONLY)
- [ ] **NFR Readiness Summary** (reference to Architecture doc, not duplication)
+  - [ ] Interval-based estimates (e.g., "~1-2 weeks" NOT "36 hours")
-  - [ ] Table with NFR categories, status, evidence, blocker, next action
+  - [ ] NO DevOps, Backend, Data Eng, Finance effort
- [ ] **Cross-references to Architecture doc** (not duplication)
+  - [ ] NO Sprint breakdowns (too prescriptive)
- [ ] **NO architectural theory** (just reference Architecture doc)
+- [ ] **Appendix A: Code Examples & Tagging**
 - [ ] **Appendix B: Knowledge Base References**
 **REMOVED SECTIONS (bloat):**
 - [ ] ❌ NO Quick Reference section (bloat)
 - [ ] ❌ NO System Architecture Summary (bloat)
 - [ ] ❌ NO Test Environment Requirements as separate section (integrated into Dependencies)
 - [ ] ❌ NO Testability Assessment section (bloat - covered in Dependencies)
 - [ ] ❌ NO Test Levels Strategy section (bloat - obvious from test scenarios)
 - [ ] ❌ NO NFR Readiness Summary (bloat)
 - [ ] ❌ NO Quality Gate Criteria section (teams decide for themselves)
 - [ ] ❌ NO Follow-on Workflows section (bloat - BMAD commands self-explanatory)
 - [ ] ❌ NO Approval section (unnecessary formality)
 - [ ] ❌ NO Infrastructure/DevOps/Finance effort tables (out of scope)
 - [ ] ❌ NO Sprint 0/1/2/3 breakdown tables (too prescriptive)
 - [ ] ❌ NO Next Steps section (bloat)
 ### Cross-Document Consistency
@ -238,6 +281,40 @@
 - [ ] Dates and authors match across documents
 - [ ] ADR and PRD references consistent
 ### Document Quality (Anti-Bloat Check)
 **CRITICAL: Check for bloat and repetition across BOTH documents**
 - [ ] **No repeated notes 10+ times** (e.g., "Timing is pessimistic until R-005 fixed" on every section)
 - [ ] **Repeated information consolidated** (write once at top, reference briefly if needed)
 - [ ] **No excessive detail** that doesn't add value (obvious concepts, redundant examples)
 - [ ] **Focus on unique/critical info** (only document what's different from standard practice)
 - [ ] **Architecture doc**: Concerns-focused, NOT implementation-focused
 - [ ] **QA doc**: Implementation-focused, NOT theory-focused
 - [ ] **Clear separation**: Architecture = WHAT and WHY, QA = HOW
 - [ ] **Professional tone**: No AI slop markers
  - [ ] Avoid excessive ✅/❌ emojis (use sparingly, only when adding clarity)
  - [ ] Avoid "absolutely", "excellent", "fantastic", overly enthusiastic language
  - [ ] Write professionally and directly
 - [ ] **Architecture doc length**: Target ~150-200 lines max (focus on actionable concerns only)
 - [ ] **QA doc length**: Keep concise, remove bloat sections
 ### Architecture Doc Structure (Actionable-First Principle)
 **CRITICAL: Validate structure follows actionable-first, FYI-last principle**
 - [ ] **Actionable sections at TOP:**
  - [ ] Quick Guide (🚨 BLOCKERS first, then ⚠️ HIGH PRIORITY, then 📋 INFO ONLY last)
  - [ ] Risk Assessment (high-priority risks ≥6 at top)
  - [ ] Testability Concerns (concerns/blockers at top, passing items at bottom)
  - [ ] Risk Mitigation Plans (for high-priority risks ≥6)
 - [ ] **FYI sections at BOTTOM:**
  - [ ] Testability Assessment Summary (what works well - only if worth mentioning)
  - [ ] Assumptions and Dependencies
 - [ ] **ASRs categorized correctly:**
  - [ ] Actionable ASRs included in 🚨 or ⚠️ sections
  - [ ] FYI ASRs included in 📋 section or omitted if obvious
 ## Completion Criteria
 **All must be true:**
@ -295,9 +372,20 @@ If workflow fails:
 - **Solution**: Use test pyramid - E2E for critical paths only
-**Issue**: Resource estimates too high
+**Issue**: Resource estimates too high or too precise
- **Solution**: Invest in fixtures/factories to reduce per-test setup time
+- **Solution**:
  - Invest in fixtures/factories to reduce per-test setup time
  - Use interval ranges (e.g., "~55-110 hours") instead of exact numbers (e.g., "81 hours")
  - Widen intervals if high uncertainty exists
 **Issue**: Execution order section too complex or redundant
 - **Solution**:
  - Default: Run everything in PRs (<15 min with Playwright parallelization)
  - Only defer to nightly/weekly if expensive (k6, chaos, 4+ hour tests)
  - Don't create smoke/P0/P1/P2/P3 tier structure
  - Don't re-list all tests (already in coverage plan)
 ### Best Practices
@ -305,7 +393,9 @@ If workflow fails:
 - High-priority risks (≥6) require immediate mitigation
 - P0 tests should cover <10% of total scenarios
 - Avoid testing same behavior at multiple levels
- Include smoke tests (P0 subset) for fast feedback
+- **Use interval-based estimates** (e.g., "~25-40 hours") instead of exact numbers to avoid false precision and provide flexibility
 - **Keep execution strategy simple**: Default to "run everything in PRs" (<15 min with Playwright), only defer if expensive/long-running
 - **Avoid execution order redundancy**: Don't create complex tier structures or re-list tests
 ---
--- a/src/bmm/workflows/testarch/test-design/instructions.md
+++ b/src/bmm/workflows/testarch/test-design/instructions.md
@ -157,7 +157,13 @@ TEA test-design workflow supports TWO modes, detected automatically:
 1. **Review Architecture for Testability**
-   Evaluate architecture against these criteria:
+   **STRUCTURE PRINCIPLE: CONCERNS FIRST, PASSING ITEMS LAST**
   Evaluate architecture against these criteria and structure output as:
   1. **Testability Concerns** (ACTIONABLE - what's broken/missing)
   2. **Testability Assessment Summary** (FYI - what works well)
   **Testability Criteria:**
   **Controllability:**
   - Can we control system state for testing? (API seeding, factories, database reset)
@ -174,8 +180,18 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - Can we reproduce failures? (deterministic waits, HAR capture, seed data)
   - Are components loosely coupled? (mockable, testable boundaries)
   **In Architecture Doc Output:**
   - **Section A: Testability Concerns** (TOP) - List what's BROKEN or MISSING
     - Example: "No API for test data seeding → Cannot parallelize tests"
     - Example: "Hardcoded DB connection → Cannot test in CI"
   - **Section B: Testability Assessment Summary** (BOTTOM) - List what PASSES
     - Example: "✅ API-first design supports test isolation"
     - Only include if worth mentioning; otherwise omit this section entirely
 2. **Identify Architecturally Significant Requirements (ASRs)**
   **CRITICAL: ASRs must indicate if ACTIONABLE or FYI**
   From PRD NFRs and architecture decisions, identify quality requirements that:
   - Drive architecture decisions (e.g., "Must handle 10K concurrent users" → caching architecture)
   - Pose testability challenges (e.g., "Sub-second response time" → performance test infrastructure)
@ -183,21 +199,60 @@ TEA test-design workflow supports TWO modes, detected automatically:
   Score each ASR using risk matrix (probability × impact).
   **In Architecture Doc, categorize ASRs:**
   - **ACTIONABLE ASRs** (require architecture changes): Include in "Quick Guide" 🚨 or ⚠️ sections
   - **FYI ASRs** (already satisfied by architecture): Include in "Quick Guide" 📋 section OR omit if obvious
   **Example:**
   - ASR-001 (Score 9): "Multi-region deployment requires region-specific test infrastructure" → **ACTIONABLE** (goes in 🚨 BLOCKERS)
   - ASR-002 (Score 4): "OAuth 2.1 authentication already implemented in ADR-5" → **FYI** (goes in 📋 INFO ONLY or omit)
   **Structure Principle:** Actionable ASRs at TOP, FYI ASRs at BOTTOM (or omit)
 3. **Define Test Levels Strategy**
   **IMPORTANT: This section goes in QA doc ONLY, NOT in Architecture doc**
   Based on architecture (mobile, web, API, microservices, monolith):
   - Recommend unit/integration/E2E split (e.g., 70/20/10 for API-heavy, 40/30/30 for UI-heavy)
   - Identify test environment needs (local, staging, ephemeral, production-like)
   - Define testing approach per technology (Playwright for web, Maestro for mobile, k6 for performance)
-4. **Assess NFR Testing Approach**
+   **In Architecture doc:** Only mention test level split if it's an ACTIONABLE concern
   - Example: "API response time <100ms requires load testing infrastructure" (concern)
   - DO NOT include full test level strategy table in Architecture doc
-   For each NFR category:
+4. **Assess NFR Requirements (MINIMAL in Architecture Doc)**
-   - **Security**: Auth/authz tests, OWASP validation, secret handling (Playwright E2E + security tools)
+
-   - **Performance**: Load/stress/spike testing with k6, SLO/SLA thresholds
+   **CRITICAL: NFR testing approach is a RECIPE - belongs in QA doc ONLY**
-   - **Reliability**: Error handling, retries, circuit breakers, health checks (Playwright + API tests)
+
   **In Architecture Doc:**
   - Only mention NFRs if they create testability CONCERNS
   - Focus on WHAT architecture must provide, not HOW to test
   - Keep it brief - 1-2 sentences per NFR category at most
   **Example - Security NFR in Architecture doc (if there's a concern):**
   ✅ CORRECT (concern-focused, brief, WHAT/WHY only):
   - "System must prevent cross-customer data access (GDPR requirement). Requires test infrastructure for multi-tenant isolation in Sprint 0."
   - "OAuth tokens must expire after 1 hour (ADR-5). Requires test harness for token expiration validation."
   ❌ INCORRECT (too detailed, belongs in QA doc):
   - Full table of security test scenarios
   - Test scripts with code examples
   - Detailed test procedures
   - Tool selection (e.g., "use Playwright E2E + OWASP ZAP")
   - Specific test approaches (e.g., "Test approach: Playwright E2E for auth/authz")
   **In QA Doc (full NFR testing approach):**
   - **Security**: Full test scenarios, tooling (Playwright + OWASP ZAP), test procedures
   - **Performance**: Load/stress/spike test scenarios, k6 scripts, SLO thresholds
   - **Reliability**: Error handling tests, retry logic validation, circuit breaker tests
   - **Maintainability**: Coverage targets, code quality gates, observability validation
   **Rule of Thumb:**
   - Architecture doc: "What NFRs exist and what concerns they create" (1-2 sentences)
   - QA doc: "How to test those NFRs" (full sections with tables, code, procedures)
 5. **Flag Testability Concerns**
   Identify architecture decisions that harm testability:
@ -228,22 +283,54 @@ TEA test-design workflow supports TWO modes, detected automatically:
   **Standard Structures (REQUIRED):**
   **test-design-architecture.md sections (in this order):**
   **STRUCTURE PRINCIPLE: Actionable items FIRST, FYI items LAST**
   1. Executive Summary (scope, business context, architecture, risk summary)
   2. Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY)
-   3. Risk Assessment (high/medium/low-priority risks with scoring)
+   3. Risk Assessment (high/medium/low-priority risks with scoring) - **ACTIONABLE**
-   4. Testability Concerns and Architectural Gaps (if system has constraints)
+   4. Testability Concerns and Architectural Gaps - **ACTIONABLE** (what arch team must do)
-   5. Risk Mitigation Plans (detailed for high-priority risks ≥6)
+      - Sub-section: Blockers to Fast Feedback (ACTIONABLE - concerns FIRST)
-   6. Assumptions and Dependencies
+      - Sub-section: Architectural Improvements Needed (ACTIONABLE)
      - Sub-section: Testability Assessment Summary (FYI - passing items LAST, only if worth mentioning)
   5. Risk Mitigation Plans (detailed for high-priority risks ≥6) - **ACTIONABLE**
   6. Assumptions and Dependencies - **FYI**
   **SECTIONS THAT DO NOT BELONG IN ARCHITECTURE DOC:**
   - ❌ Test Levels Strategy (unit/integration/E2E split) - This is a RECIPE, belongs in QA doc ONLY
   - ❌ NFR Testing Approach with test examples - This is a RECIPE, belongs in QA doc ONLY
   - ❌ Test Environment Requirements - This is a RECIPE, belongs in QA doc ONLY
   - ❌ Recommendations for Sprint 0 (test framework setup, factories) - This is a RECIPE, belongs in QA doc ONLY
   - ❌ Quality Gate Criteria (pass rates, coverage targets) - This is a RECIPE, belongs in QA doc ONLY
   - ❌ Tool Selection (Playwright, k6, etc.) - This is a RECIPE, belongs in QA doc ONLY
   **WHAT BELONGS IN ARCHITECTURE DOC:**
   - ✅ Testability CONCERNS (what makes it hard to test)
   - ✅ Architecture GAPS (what's missing for testability)
   - ✅ What architecture team must DO (blockers, improvements)
   - ✅ Risks and mitigation plans
   - ✅ ASRs (Architecturally Significant Requirements) - but clarify if FYI or actionable
   **test-design-qa.md sections (in this order):**
-   1. Quick Reference for QA (Before You Start, Execution Order, Need Help)
+   1. Executive Summary (risk summary, coverage summary)
-   2. System Architecture Summary (brief overview)
+   2. **Dependencies & Test Blockers** (CRITICAL: RIGHT AFTER SUMMARY - what QA needs from other teams)
-   3. Test Environment Requirements (MOVE UP - section 3, NOT buried at end)
+   3. Risk Assessment (scored risks with categories - reference Arch doc, don't duplicate)
-   4. Testability Assessment (lightweight prerequisites checklist)
+   4. Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
-   5. Test Levels Strategy (unit/integration/E2E split with rationale)
+   5. **Execution Strategy** (SIMPLE: Organized by TOOL TYPE: PR (Playwright) / Nightly (k6) / Weekly (chaos/manual))
-   6. Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
+   6. QA Effort Estimate (QA effort ONLY - no DevOps, Data Eng, Finance, Backend)
-   7. Sprint 0 Setup Requirements (blockers, infrastructure, environments)
+   7. Appendices (code examples with playwright-utils, tagging strategy, knowledge base refs)
-   8. NFR Readiness Summary (reference to Architecture doc)
+
   **SECTIONS TO EXCLUDE FROM QA DOC:**
   - ❌ Quality Gate Criteria (pass/fail thresholds - teams decide for themselves)
   - ❌ Follow-on Workflows (bloat - BMAD commands are self-explanatory)
   - ❌ Approval section (unnecessary formality)
   - ❌ Test Environment Requirements (remove as separate section - integrate into Dependencies if needed)
   - ❌ NFR Readiness Summary (bloat - covered in Risk Assessment)
   - ❌ Testability Assessment (bloat - covered in Dependencies)
   - ❌ Test Levels Strategy (bloat - obvious from test scenarios)
   - ❌ Sprint breakdowns (too prescriptive)
   - ❌ Infrastructure/DevOps/Data Eng effort tables (out of scope)
   - ❌ Mitigation plans for non-QA work (belongs in Arch doc)
   **Content Guidelines:**
@ -252,26 +339,46 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - ✅ Clear ownership (each blocker/ASR has owner + timeline)
   - ✅ Testability requirements (what architecture must support)
   - ✅ Mitigation plans (for each high-risk item ≥6)
-   - ✅ Short code examples (5-10 lines max showing what to support)
+   - ✅ Brief conceptual examples ONLY if needed to clarify architecture concerns (5-10 lines max)
   - ✅ **Target length**: ~150-200 lines max (focus on actionable concerns only)
   - ✅ **Professional tone**: Avoid AI slop (excessive ✅/❌ emojis, "absolutely", "excellent", overly enthusiastic language)
-   **Architecture doc (DON'T):**
+   **Architecture doc (DON'T) - CRITICAL:**
-   - ❌ NO long test code examples (belongs in QA doc)
+   - ❌ NO test scripts or test implementation code AT ALL - This is a communication doc for architects, not a testing guide
-   - ❌ NO test scenario checklists (belongs in QA doc)
+   - ❌ NO Playwright test examples (e.g., test('...', async ({ request }) => ...))
-   - ❌ NO implementation details (how QA will test)
+   - ❌ NO assertion logic (e.g., expect(...).toBe(...))
   - ❌ NO test scenario checklists with checkboxes (belongs in QA doc)
   - ❌ NO implementation details about HOW QA will test
   - ❌ Focus on CONCERNS, not IMPLEMENTATION
   **QA doc (DO):**
   - ✅ Test scenario recipes (clear P0/P1/P2/P3 with checkboxes)
-   - ✅ Environment setup (Sprint 0 checklist with blockers)
+   - ✅ Full test implementation code samples when helpful
-   - ✅ Tool setup (factories, fixtures, frameworks)
+   - ✅ **IMPORTANT: If config.tea_use_playwright_utils is true, ALL code samples MUST use @seontechnologies/playwright-utils fixtures and utilities**
   - ✅ Import test fixtures from '@seontechnologies/playwright-utils/api-request/fixtures'
   - ✅ Import expect from '@playwright/test' (playwright-utils does not re-export expect)
   - ✅ Use apiRequest fixture with schema validation, retry logic, and structured responses
   - ✅ Dependencies & Test Blockers section RIGHT AFTER Executive Summary (what QA needs from other teams)
   - ✅ **QA effort estimates ONLY** (no DevOps, Data Eng, Finance, Backend effort - out of scope)
   - ✅ Cross-references to Architecture doc (not duplication)
   - ✅ **Professional tone**: Avoid AI slop (excessive ✅/❌ emojis, "absolutely", "excellent", overly enthusiastic language)
   **QA doc (DON'T):**
   - ❌ NO architectural theory (just reference Architecture doc)
   - ❌ NO ASR explanations (link to Architecture doc instead)
   - ❌ NO duplicate risk assessments (reference Architecture doc)
   - ❌ NO Quality Gate Criteria section (teams decide pass/fail thresholds for themselves)
   - ❌ NO Follow-on Workflows section (bloat - BMAD commands are self-explanatory)
   - ❌ NO Approval section (unnecessary formality)
   - ❌ NO effort estimates for other teams (DevOps, Backend, Data Eng, Finance - out of scope, QA effort only)
   - ❌ NO Sprint breakdowns (too prescriptive - e.g., "Sprint 0: 40 hours, Sprint 1: 48 hours")
   - ❌ NO mitigation plans for Backend/Arch/DevOps work (those belong in Architecture doc)
   - ❌ NO architectural assumptions or debates (those belong in Architecture doc)
   **Anti-Patterns to Avoid (Cross-Document Redundancy):**
   **CRITICAL: NO BLOAT, NO REPETITION, NO OVERINFO**
   ❌ **DON'T duplicate OAuth requirements:**
   - Architecture doc: Explain OAuth 2.1 flow in detail
   - QA doc: Re-explain why OAuth 2.1 is required
@ -280,6 +387,24 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)"
   - QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)"
   ❌ **DON'T repeat the same note 10+ times:**
   - Example: "Timing is pessimistic until R-005 is fixed" repeated on every P0, P1, P2 section
   - This creates bloat and makes docs hard to read
   ✅ **DO consolidate repeated information:**
   - Write once at the top: "**Note**: All timing estimates are pessimistic pending R-005 resolution"
   - Reference briefly if needed: "(pessimistic timing)"
   ❌ **DON'T include excessive detail that doesn't add value:**
   - Long explanations of obvious concepts
   - Redundant examples showing the same pattern
   - Over-documentation of standard practices
   ✅ **DO focus on what's unique or critical:**
   - Document only what's different from standard practice
   - Highlight critical decisions and risks
   - Keep explanations concise and actionable
   **Markdown Cross-Reference Syntax Examples:**
   ```markdown
@ -330,6 +455,24 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - Cross-reference between docs (no duplication)
   - Validate against checklist.md (System-Level Mode section)
 **Common Over-Engineering to Avoid:**
   **In QA Doc:**
   1. ❌ Quality gate thresholds ("P0 must be 100%, P1 ≥95%") - Let teams decide for themselves
   2. ❌ Effort estimates for other teams - QA doc should only estimate QA effort
   3. ❌ Sprint breakdowns ("Sprint 0: 40 hours, Sprint 1: 48 hours") - Too prescriptive
   4. ❌ Approval sections - Unnecessary formality
   5. ❌ Assumptions about architecture (SLO targets, replication lag) - These are architectural concerns, belong in Arch doc
   6. ❌ Mitigation plans for Backend/Arch/DevOps - Those belong in Arch doc
   7. ❌ Follow-on workflows section - Bloat, BMAD commands are self-explanatory
   8. ❌ NFR Readiness Summary - Bloat, covered in Risk Assessment
   **Test Coverage Numbers Reality Check:**
   - With Playwright parallelization, running ALL Playwright tests is as fast as running just P0
   - Don't split Playwright tests by priority into different CI gates - it adds no value
   - Tool type matters, not priority labels
   - Defer based on infrastructure cost, not importance
 **After System-Level Mode:** Workflow COMPLETE. System-level outputs (test-design-architecture.md + test-design-qa.md) are written in this step. Steps 2-4 are epic-level only - do NOT execute them in system-level mode.
 ---
@ -540,12 +683,51 @@ TEA test-design workflow supports TWO modes, detected automatically:
 8. **Plan Mitigations**
   **CRITICAL: Mitigation placement depends on WHO does the work**
   For each high-priority risk:
   - Define mitigation strategy
   - Assign owner (dev, QA, ops)
   - Set timeline
   - Update residual risk expectation
   **Mitigation Plan Placement:**
   **Architecture Doc:**
   - Mitigations owned by Backend, DevOps, Architecture, Security, Data Eng
   - Example: "Add authorization layer for customer-scoped access" (Backend work)
   - Example: "Configure AWS Fault Injection Simulator" (DevOps work)
   - Example: "Define CloudWatch log schema for backfill events" (Architecture work)
   **QA Doc:**
   - Mitigations owned by QA (test development work)
   - Example: "Create factories for test data with randomization" (QA work)
   - Example: "Implement polling with retry for async validation" (QA test code)
   - Brief reference to Architecture doc mitigations (don't duplicate)
   **Rule of Thumb:**
   - If mitigation requires production code changes → Architecture doc
   - If mitigation is test infrastructure/code → QA doc
   - If mitigation involves multiple teams → Architecture doc with QA validation approach
   **Assumptions Placement:**
   **Architecture Doc:**
   - Architectural assumptions (SLO targets, replication lag, system design assumptions)
   - Example: "P95 <500ms inferred from <2s timeout (requires Product approval)"
   - Example: "Multi-region replication lag <1s assumed (ADR doesn't specify SLA)"
   - Example: "Recent Cache hit ratio >80% assumed (not in PRD/ADR)"
   **QA Doc:**
   - Test execution assumptions (test infrastructure readiness, test data availability)
   - Example: "Assumes test factories already created"
   - Example: "Assumes CI/CD pipeline configured"
   - Brief reference to Architecture doc for architectural assumptions
   **Rule of Thumb:**
   - If assumption is about system architecture/design → Architecture doc
   - If assumption is about test infrastructure/execution → QA doc
 ---
 ## Step 3: Design Test Coverage
@ -594,6 +776,8 @@ TEA test-design workflow supports TWO modes, detected automatically:
 3. **Assign Priority Levels**
   **CRITICAL: P0/P1/P2/P3 indicates priority and risk level, NOT execution timing**
   **Knowledge Base Reference**: `test-priorities-matrix.md`
   **P0 (Critical)**:
@ -601,25 +785,28 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - High-risk areas (score ≥6)
   - Revenue-impacting
   - Security-critical
-   - **Run on every commit**
+   - No workaround exists
   - Affects majority of users
   **P1 (High)**:
   - Important user features
   - Medium-risk areas (score 3-4)
   - Common workflows
-   - **Run on PR to main**
+   - Workaround exists but difficult
   **P2 (Medium)**:
   - Secondary features
   - Low-risk areas (score 1-2)
   - Edge cases
-   - **Run nightly or weekly**
+   - Regression prevention
   **P3 (Low)**:
   - Nice-to-have
   - Exploratory
   - Performance benchmarks
-   - **Run on-demand**
+   - Documentation validation
   **NOTE:** Priority classification is separate from execution timing. A P1 test might run in PRs if it's fast, or nightly if it requires expensive infrastructure (e.g., k6 performance test). See "Execution Strategy" section for timing guidance.
 4. **Outline Data and Tooling Prerequisites**
@ -629,13 +816,55 @@ TEA test-design workflow supports TWO modes, detected automatically:
   - Environment setup
   - Tools and dependencies
-5. **Define Execution Order**
+5. **Define Execution Strategy** (Keep It Simple)
-   Recommend test execution sequence:
+   **IMPORTANT: Avoid over-engineering execution order**
-   1. **Smoke tests** (P0 subset, <5 min)
+
-   2. **P0 tests** (critical paths, <10 min)
+   **Default Philosophy:**
-   3. **P1 tests** (important features, <30 min)
+   - Run **everything** in PRs if total duration <15 minutes
-   4. **P2/P3 tests** (full regression, <60 min)
+   - Playwright is fast with parallelization (100s of tests in ~10-15 min)
   - Only defer to nightly/weekly if there's significant overhead:
     - Performance tests (k6, load testing) - expensive infrastructure
     - Chaos engineering - requires special setup (AWS FIS)
     - Long-running tests - endurance (4+ hours), disaster recovery
     - Manual tests - require human intervention
   **Simple Execution Strategy (Organized by TOOL TYPE):**
   ```markdown
   ## Execution Strategy
   **Philosophy**: Run everything in PRs unless significant infrastructure overhead.
   Playwright with parallelization is extremely fast (100s of tests in ~10-15 min).
   **Organized by TOOL TYPE:**
   ### Every PR: Playwright Tests (~10-15 min)
   All functional tests (from any priority level):
   - All E2E, API, integration, unit tests using Playwright
   - Parallelized across {N} shards
   - Total: ~{N} tests (includes P0, P1, P2, P3)
   ### Nightly: k6 Performance Tests (~30-60 min)
   All performance tests (from any priority level):
   - Load, stress, spike, endurance
   - Reason: Expensive infrastructure, long-running (10-40 min per test)
   ### Weekly: Chaos & Long-Running (~hours)
   Special infrastructure tests (from any priority level):
   - Multi-region failover, disaster recovery, endurance
   - Reason: Very expensive, very long (4+ hours)
   ```
   **KEY INSIGHT: Organize by TOOL TYPE, not priority**
   - Playwright (fast, cheap) → PR
   - k6 (expensive, long) → Nightly
   - Chaos/Manual (very expensive, very long) → Weekly
   **Avoid:**
   - ❌ Don't organize by priority (smoke → P0 → P1 → P2 → P3)
   - ❌ Don't say "P1 runs on PR to main" (some P1 are Playwright/PR, some are k6/Nightly)
   - ❌ Don't create artificial tiers - organize by tool type and infrastructure overhead
 ---
@ -661,34 +890,66 @@ TEA test-design workflow supports TWO modes, detected automatically:
   | Login flow  | E2E        | P0       | R-001     | 3          | QA    |
   ```
-3. **Document Execution Order**
+3. **Document Execution Strategy** (Simple, Not Redundant)
   **IMPORTANT: Keep execution strategy simple and avoid redundancy**
   ```markdown
-   ### Smoke Tests (<5 min)
+   ## Execution Strategy
-   - Login successful
+   **Default: Run all functional tests in PRs (~10-15 min)**
-   - Dashboard loads
+   - All Playwright tests (parallelized across 4 shards)
   - Includes E2E, API, integration, unit tests
   - Total: ~{N} tests
-   ### P0 Tests (<10 min)
+   **Nightly: Performance & Infrastructure tests**
   - k6 load/stress/spike tests (~30-60 min)
   - Reason: Expensive infrastructure, long-running
-   - [Full P0 list]
+   **Weekly: Chaos & Disaster Recovery**
-
+   - Endurance tests (4+ hours)
-   ### P1 Tests (<30 min)
+   - Multi-region failover (requires AWS FIS)
-
+   - Backup restore validation
-   - [Full P1 list]
+   - Reason: Special infrastructure, very long-running
   ```
   **DO NOT:**
   - ❌ Create redundant smoke/P0/P1/P2/P3 tier structure
   - ❌ List all tests again in execution order (already in coverage plan)
   - ❌ Split tests by priority unless there's infrastructure overhead
 4. **Include Resource Estimates**
   **IMPORTANT: Use intervals/ranges, not exact numbers**
   Provide rough estimates with intervals to avoid false precision:
   ```markdown
   ### Test Effort Estimates
-   - P0 scenarios: 15 tests × 2 hours = 30 hours
+   - P0 scenarios: 15 tests (~1.5-2.5 hours each) = **~25-40 hours**
-   - P1 scenarios: 25 tests × 1 hour = 25 hours
+   - P1 scenarios: 25 tests (~0.75-1.5 hours each) = **~20-35 hours**
-   - P2 scenarios: 40 tests × 0.5 hour = 20 hours
+   - P2 scenarios: 40 tests (~0.25-0.75 hours each) = **~10-30 hours**
-   - **Total:** 75 hours (~10 days)
+   - **Total:** **~55-105 hours** (~1.5-3 weeks with 1 QA engineer)
   ```
   **Why intervals:**
   - Avoids false precision (estimates are never exact)
   - Provides flexibility for complexity variations
   - Accounts for unknowns and dependencies
   - More realistic and less prescriptive
   **Guidelines:**
   - P0 tests: 1.5-2.5 hours each (complex setup, security, performance)
   - P1 tests: 0.75-1.5 hours each (standard integration, API tests)
   - P2 tests: 0.25-0.75 hours each (edge cases, simple validation)
   - P3 tests: 0.1-0.5 hours each (exploratory, documentation)
   **Express totals as:**
   - Hour ranges: "~55-105 hours"
   - Week ranges: "~1.5-3 weeks"
   - Avoid: Exact numbers like "75 hours" or "11 days"
 5. **Add Gate Criteria**
   ```markdown
--- a/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md
+++ b/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md
@ -108,54 +108,51 @@
 ### Testability Concerns and Architectural Gaps
-**IMPORTANT**: {If system has constraints, explain them. If standard CI/CD achievable, state that.}
+**🚨 ACTIONABLE CONCERNS - Architecture Team Must Address**
-#### Blockers to Fast Feedback
+{If system has critical testability concerns, list them here. If architecture supports testing well, state "No critical testability concerns identified" and skip to Testability Assessment Summary}
-| Blocker | Impact | Current Mitigation | Ideal Solution |
+#### 1. Blockers to Fast Feedback (WHAT WE NEED FROM ARCHITECTURE)
 |---------|--------|-------------------|----------------|
 | **{Blocker name}** | {Impact description} | {How we're working around it} | {What architecture should provide} |
-#### Why This Matters
+| Concern | Impact | What Architecture Must Provide | Owner | Timeline |
 |---------|--------|--------------------------------|-------|----------|
 | **{Concern name}** | {Impact on testing} | {Specific architectural change needed} | {Team} | {Sprint} |
-**Standard CI/CD expectations:**
+**Example:**
- Full test suite on every commit (~5-15 min feedback)
+- **No API for test data seeding** → Cannot parallelize tests → Provide POST /test/seed endpoint (Backend, Sprint 0)
 - Parallel test execution (isolated test data per worker)
 - Ephemeral test environments (spin up → test → tear down)
 - Fast feedback loop (devs stay in flow state)
-**Current reality for {Feature}:**
+#### 2. Architectural Improvements Needed (WHAT SHOULD BE CHANGED)
 - {Actual situation - what's different from standard}
-#### Tiered Testing Strategy
+{List specific improvements that would make the system more testable}
 {If forced by architecture, explain. If standard approach works, state that.}
 | Tier | When | Duration | Coverage | Why Not Full Suite? |
 |------|------|----------|----------|---------------------|
 | **Smoke** | Every commit | <5 min | {N} tests | Fast feedback, catch build-breaking changes |
 | **P0** | Every commit | ~{X} min | ~{N} tests | Critical paths, security-critical flows |
 | **P1** | PR to main | ~{X} min | ~{N} tests | Important features, algorithm accuracy |
 | **P2/P3** | Nightly | ~{X} min | ~{N} tests | Edge cases, performance, NFR |
 **Note**: {Any timing assumptions or constraints}
 #### Architectural Improvements Needed
 {If system has technical debt affecting testing, list improvements. If architecture supports testing well, acknowledge that.}
 1. **{Improvement name}**
-   - {What to change}
+   - **Current problem**: {What's wrong}
-   - **Impact**: {How it improves testing}
+   - **Required change**: {What architecture must do}
   - **Impact if not fixed**: {Consequences}
   - **Owner**: {Team}
   - **Timeline**: {Sprint}
-#### Acceptance of Trade-offs
+---
-For {Feature} Phase 1, the team accepts:
+### Testability Assessment Summary
 - **{Trade-off 1}** ({Reasoning})
 - **{Trade-off 2}** ({Reasoning})
 - ⚠️ **{Known limitation}** ({Why acceptable for now})
-This is {**technical debt** OR **acceptable for Phase 1**} that should be {revisited post-GA OR maintained as-is}.
+**📊 CURRENT STATE - FYI**
 {Only include this section if there are passing items worth mentioning. Otherwise omit.}
 #### What Works Well
 - ✅ {Passing item 1} (e.g., "API-first design supports parallel test execution")
 - ✅ {Passing item 2} (e.g., "Feature flags enable test isolation")
 - ✅ {Passing item 3}
 #### Accepted Trade-offs (No Action Required)
 For {Feature} Phase 1, the following trade-offs are acceptable:
 - **{Trade-off 1}** - {Why acceptable for now}
 - **{Trade-off 2}** - {Why acceptable for now}
 {This is technical debt OR acceptable for Phase 1} that {should be revisited post-GA OR maintained as-is}
 ---
--- a/src/bmm/workflows/testarch/test-design/test-design-qa-template.md
+++ b/src/bmm/workflows/testarch/test-design/test-design-qa-template.md
@ -1,314 +1,286 @@
 # Test Design for QA: {Feature Name}
-**Purpose:** Test execution recipe for QA team. Defines test scenarios, coverage plan, tooling, and Sprint 0 setup requirements. Use this as your implementation guide after architectural blockers are resolved.
+**Purpose:** Test execution recipe for QA team. Defines what to test, how to test it, and what QA needs from other teams.
 **Date:** {date}
 **Author:** {author}
-**Status:** Draft / Ready for Implementation
+**Status:** Draft
 **Project:** {project_name}
-**PRD Reference:** {prd_link}
+
-**ADR Reference:** {adr_link}
+**Related:** See Architecture doc (test-design-architecture.md) for testability concerns and architectural blockers.
 ---
-## Quick Reference for QA
+## Executive Summary
-**Before You Start:**
+**Scope:** {Brief description of testing scope}
 - [ ] Review Architecture doc (test-design-architecture.md) - understand blockers and risks
 - [ ] Verify Sprint 0 blockers resolved (see Sprint 0 section below)
 - [ ] Confirm test infrastructure ready (factories, fixtures, environments)
-**Test Execution Order:**
+**Risk Summary:**
-1. **Smoke tests** (<5 min) - Fast feedback on critical paths
+- Total Risks: {N} ({X} high-priority score ≥6, {Y} medium, {Z} low)
-2. **P0 tests** (~{X} min) - Critical paths, security-critical flows
+- Critical Categories: {Categories with most high-priority risks}
 3. **P1 tests** (~{X} min) - Important features, algorithm accuracy
 4. **P2/P3 tests** (~{X} min) - Edge cases, performance, NFR
-**Need Help?**
+**Coverage Summary:**
- Blockers: See Architecture doc "Quick Guide" for mitigation plans
+- P0 tests: ~{N} (critical paths, security)
- Test scenarios: See "Test Coverage Plan" section below
+- P1 tests: ~{N} (important features, integration)
- Sprint 0 setup: See "Sprint 0 Setup Requirements" section
+- P2 tests: ~{N} (edge cases, regression)
 - P3 tests: ~{N} (exploratory, benchmarks)
 - **Total**: ~{N} tests (~{X}-{Y} weeks with 1 QA)
 ---
-## System Architecture Summary
+## Dependencies & Test Blockers
-**Data Pipeline:**
+**CRITICAL:** QA cannot proceed without these items from other teams.
 {Brief description of system flow}
-**Key Services:**
+### Backend/Architecture Dependencies (Sprint 0)
 - **{Service 1}**: {Purpose and key responsibilities}
 - **{Service 2}**: {Purpose and key responsibilities}
 - **{Service 3}**: {Purpose and key responsibilities}
-**Data Stores:**
+**Source:** See Architecture doc "Quick Guide" for detailed mitigation plans
 - **{Database 1}**: {What it stores}
 - **{Database 2}**: {What it stores}
-**Expected Scale** (from ADR):
+1. **{Dependency 1}** - {Team} - {Timeline}
- {Key metrics: RPS, volume, users, etc.}
+   - {What QA needs}
   - {Why it blocks testing}
---
+2. **{Dependency 2}** - {Team} - {Timeline}
   - {What QA needs}
   - {Why it blocks testing}
-## Test Environment Requirements
+### QA Infrastructure Setup (Sprint 0)
-**{Company} Standard:** Shared DB per Environment with Randomization (Shift-Left)
+1. **Test Data Factories** - QA
   - {Entity} factory with faker-based randomization
   - Auto-cleanup fixtures for parallel safety
-| Environment | Database | Test Data Strategy | Purpose |
+2. **Test Environments** - QA
-|-------------|----------|-------------------|---------|
+   - Local: {Setup details}
-| **Local** | {DB} (shared) | Randomized (faker), auto-cleanup | Local development |
+   - CI/CD: {Setup details}
-| **Dev (CI)** | {DB} (shared) | Randomized (faker), auto-cleanup | PR validation |
+   - Staging: {Setup details}
 | **Staging** | {DB} (shared) | Randomized (faker), auto-cleanup | Pre-production, E2E |
-**Key Principles:**
+**Example factory pattern:**
 - **Shared database per environment** (no ephemeral)
 - **Randomization for isolation** (faker-based unique IDs)
 - **Parallel-safe** (concurrent test runs don't conflict)
 - **Self-cleaning** (tests delete their own data)
 - **Shift-left** (test against real DBs early)
 **Example:**
 ```typescript
-import { faker } from "@faker-js/faker";
+import { test } from '@seontechnologies/playwright-utils/api-request/fixtures';
 import { expect } from '@playwright/test';
 import { faker } from '@faker-js/faker';
-test("example with randomized test data @p0", async ({ apiRequest }) => {
+test('example test @p0', async ({ apiRequest }) => {
  const testData = {
    id: `test-${faker.string.uuid()}`,
-    customerId: `test-customer-${faker.string.alphanumeric(8)}`,
+    email: faker.internet.email(),
    // ... unique test data
  };
-  // Seed, test, cleanup
+  const { status } = await apiRequest({
    method: 'POST',
    path: '/api/resource',
    body: testData,
  });
  expect(status).toBe(201);
 });
 ```
 ---
-## Testability Assessment
+## Risk Assessment
-**Prerequisites from Architecture Doc:**
+**Note:** Full risk details in Architecture doc. This section summarizes risks relevant to QA test planning.
-Verify these blockers are resolved before test development:
+### High-Priority Risks (Score ≥6)
 - [ ] {Blocker 1} (see Architecture doc Quick Guide → 🚨 BLOCKERS)
 - [ ] {Blocker 2}
 - [ ] {Blocker 3}
-**If Prerequisites Not Met:** Coordinate with Architecture team (see Architecture doc for mitigation plans and owner assignments)
+| Risk ID | Category | Description | Score | QA Test Coverage |
 |---------|----------|-------------|-------|------------------|
 | **{R-ID}** | {CAT} | {Brief description} | **{Score}** | {How QA validates this risk} |
---
+### Medium/Low-Priority Risks
-## Test Levels Strategy
+| Risk ID | Category | Description | Score | QA Test Coverage |
-
+|---------|----------|-------------|-------|------------------|
-**System Type:** {API-heavy / UI-heavy / Mixed backend system}
+| {R-ID} | {CAT} | {Brief description} | {Score} | {How QA validates this risk} |
 **Recommended Split:**
 - **Unit Tests: {X}%** - {What to unit test}
 - **Integration/API Tests: {X}%** - ⭐ **PRIMARY FOCUS** - {What to integration test}
 - **E2E Tests: {X}%** - {What to E2E test}
 **Rationale:** {Why this split makes sense for this system}
 **Test Count Summary:**
 - P0: ~{N} tests - Critical paths, run on every commit
 - P1: ~{N} tests - Important features, run on PR to main
 - P2: ~{N} tests - Edge cases, run nightly/weekly
 - P3: ~{N} tests - Exploratory, run on-demand
 - **Total: ~{N} tests** (~{X} weeks for 1 QA, ~{Y} weeks for 2 QAs)
 ---
 ## Test Coverage Plan
-**Repository Note:** {Where tests live - backend repo, admin panel repo, etc. - and how CI pipelines are organized}
+**IMPORTANT:** P0/P1/P2/P3 = **priority and risk level** (what to focus on if time-constrained), NOT execution timing. See "Execution Strategy" for when tests run.
-### P0 (Critical) - Run on every commit (~{X} min)
+### P0 (Critical)
-**Execution:** CI/CD on every commit, parallel workers, smoke tests first (<5 min)
+**Criteria:** Blocks core functionality + High risk (≥6) + No workaround + Affects majority of users
-**Purpose:** Critical path validation - catch build-breaking changes and security violations immediately
+| Test ID | Requirement | Test Level | Risk Link | Notes |
 |---------|-------------|------------|-----------|-------|
 | **P0-001** | {Requirement} | {Level} | {R-ID} | {Notes} |
 | **P0-002** | {Requirement} | {Level} | {R-ID} | {Notes} |
-**Criteria:** Blocks core functionality OR High risk (≥6) OR No workaround
+**Total P0:** ~{N} tests
 **Key Smoke Tests** (subset of P0, run first for fast feedback):
 - {Smoke test 1} - {Duration}
 - {Smoke test 2} - {Duration}
 - {Smoke test 3} - {Duration}
 | Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
 |-------------|------------|-----------|------------|-------|-------|
 | {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
 | {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
 **Total P0:** ~{N} tests (~{X} weeks)
 #### P0 Test Scenarios (Detailed)
 **1. {Test Category} ({N} tests) - {CRITICALITY if applicable}**
 - [ ] {Scenario 1 with checkbox}
 - [ ] {Scenario 2}
 - [ ] {Scenario 3}
 **2. {Test Category 2} ({N} tests)**
 - [ ] {Scenario 1}
 - [ ] {Scenario 2}
 {Continue for all P0 categories}
 ---
-### P1 (High) - Run on PR to main (~{X} min additional)
+### P1 (High)
-**Execution:** CI/CD on pull requests to main branch, runs after P0 passes, parallel workers
+**Criteria:** Important features + Medium risk (3-4) + Common workflows + Workaround exists but difficult
-**Purpose:** Important feature coverage - algorithm accuracy, complex workflows, Admin Panel interactions
+| Test ID | Requirement | Test Level | Risk Link | Notes |
 |---------|-------------|------------|-----------|-------|
 | **P1-001** | {Requirement} | {Level} | {R-ID} | {Notes} |
 | **P1-002** | {Requirement} | {Level} | {R-ID} | {Notes} |
-**Criteria:** Important features OR Medium risk (3-4) OR Common workflows
+**Total P1:** ~{N} tests
 | Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
 |-------------|------------|-----------|------------|-------|-------|
 | {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
 | {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
 **Total P1:** ~{N} tests (~{X} weeks)
 #### P1 Test Scenarios (Detailed)
 **1. {Test Category} ({N} tests)**
 - [ ] {Scenario 1}
 - [ ] {Scenario 2}
 {Continue for all P1 categories}
 ---
-### P2 (Medium) - Run nightly/weekly (~{X} min)
+### P2 (Medium)
-**Execution:** Scheduled nightly run (or weekly for P3), full infrastructure, sequential execution acceptable
+**Criteria:** Secondary features + Low risk (1-2) + Edge cases + Regression prevention
-**Purpose:** Edge case coverage, error handling, data integrity validation - slow feedback acceptable
+| Test ID | Requirement | Test Level | Risk Link | Notes |
 |---------|-------------|------------|-----------|-------|
 | **P2-001** | {Requirement} | {Level} | {R-ID} | {Notes} |
-**Criteria:** Secondary features OR Low risk (1-2) OR Edge cases
+**Total P2:** ~{N} tests
 | Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
 |-------------|------------|-----------|------------|-------|-------|
 | {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
 | {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
 **Total P2:** ~{N} tests (~{X} weeks)
 ---
-### P3 (Low) - Run on-demand (exploratory)
+### P3 (Low)
-**Execution:** Manual trigger or weekly scheduled run, performance testing
+**Criteria:** Nice-to-have + Exploratory + Performance benchmarks + Documentation validation
-**Purpose:** Full regression, performance benchmarks, accessibility validation - no time pressure
+| Test ID | Requirement | Test Level | Notes |
 |---------|-------------|------------|-------|
 | **P3-001** | {Requirement} | {Level} | {Notes} |
-**Criteria:** Nice-to-have OR Exploratory OR Performance benchmarks
+**Total P3:** ~{N} tests
 | Requirement | Test Level | Test Count | Owner | Notes |
 |-------------|------------|------------|-------|-------|
 | {Requirement 1} | {Level} | {N} | QA | {Notes} |
 | {Requirement 2} | {Level} | {N} | QA | {Notes} |
 **Total P3:** ~{N} tests (~{X} days)
 ---
-### Coverage Matrix (Requirements → Tests)
+## Execution Strategy
-| Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
+**Philosophy:** Run everything in PRs unless there's significant infrastructure overhead. Playwright with parallelization is extremely fast (100s of tests in ~10-15 min).
-|-------------|------------|----------|-----------|------------|-------|
+
-| {Requirement 1} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
+**Organized by TOOL TYPE:**
-| {Requirement 2} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
+
 ### Every PR: Playwright Tests (~10-15 min)
 **All functional tests** (from any priority level):
 - All E2E, API, integration, unit tests using Playwright
 - Parallelized across {N} shards
 - Total: ~{N} Playwright tests (includes P0, P1, P2, P3)
 **Why run in PRs:** Fast feedback, no expensive infrastructure
 ### Nightly: k6 Performance Tests (~30-60 min)
 **All performance tests** (from any priority level):
 - Load, stress, spike, endurance tests
 - Total: ~{N} k6 tests (may include P0, P1, P2)
 **Why defer to nightly:** Expensive infrastructure (k6 Cloud), long-running (10-40 min per test)
 ### Weekly: Chaos & Long-Running (~hours)
 **Special infrastructure tests** (from any priority level):
 - Multi-region failover (requires AWS Fault Injection Simulator)
 - Disaster recovery (backup restore, 4+ hours)
 - Endurance tests (4+ hours runtime)
 **Why defer to weekly:** Very expensive infrastructure, very long-running, infrequent validation sufficient
 **Manual tests** (excluded from automation):
 - DevOps validation (deployment, monitoring)
 - Finance validation (cost alerts)
 - Documentation validation
 ---
-## Sprint 0 Setup Requirements
+## QA Effort Estimate
-**IMPORTANT:** These items **BLOCK test development**. Complete in Sprint 0 before QA can write tests.
+**QA test development effort only** (excludes DevOps, Backend, Data Eng, Finance work):
-### Architecture/Backend Blockers (from Architecture doc)
+| Priority | Count | Effort Range | Notes |
 |----------|-------|--------------|-------|
 | P0 | ~{N} | ~{X}-{Y} weeks | Complex setup (security, performance, multi-step) |
 | P1 | ~{N} | ~{X}-{Y} weeks | Standard coverage (integration, API tests) |
 | P2 | ~{N} | ~{X}-{Y} days | Edge cases, simple validation |
 | P3 | ~{N} | ~{X}-{Y} days | Exploratory, benchmarks |
 | **Total** | ~{N} | **~{X}-{Y} weeks** | **1 QA engineer, full-time** |
-**Source:** See Architecture doc "Quick Guide" for detailed mitigation plans
+**Assumptions:**
 - Includes test design, implementation, debugging, CI integration
 - Excludes ongoing maintenance (~10% effort)
 - Assumes test infrastructure (factories, fixtures) ready
-1. **{Blocker 1}** 🚨 **BLOCKER** - {Owner}
+**Dependencies from other teams:**
-   - {What needs to be provided}
+- See "Dependencies & Test Blockers" section for what QA needs from Backend, DevOps, Data Eng
   - **Details:** Architecture doc {Risk-ID} mitigation plan
 2. **{Blocker 2}** 🚨 **BLOCKER** - {Owner}
   - {What needs to be provided}
   - **Details:** Architecture doc {Risk-ID} mitigation plan
 ### QA Test Infrastructure
 1. **{Factory/Fixture Name}** - QA
   - Faker-based generator: `{function_signature}`
   - Auto-cleanup after tests
 2. **{Entity} Fixtures** - QA
   - Seed scripts for {states/scenarios}
   - Isolated {id_pattern} per test
 ### Test Environments
 **Local:** {Setup details - Docker, LocalStack, etc.}
 **CI/CD:** {Setup details - shared infrastructure, parallel workers, artifacts}
 **Staging:** {Setup details - shared multi-tenant, nightly E2E}
 **Production:** {Setup details - feature flags, canary transactions}
 **Sprint 0 NFR Gates** (MUST complete before integration testing):
 - [ ] {Gate 1}: {Description} (Owner) 🚨
 - [ ] {Gate 2}: {Description} (Owner) 🚨
 - [ ] {Gate 3}: {Description} (Owner) 🚨
 ### Sprint 1 Items (Not Sprint 0)
 - **{Item 1}** ({Owner}): {Description}
 - **{Item 2}** ({Owner}): {Description}
 **Sprint 1 NFR Gates** (MUST complete before GA):
 - [ ] {Gate 1}: {Description} (Owner)
 - [ ] {Gate 2}: {Description} (Owner)
 ---
-## NFR Readiness Summary
+## Appendix A: Code Examples & Tagging
-**Based on Architecture Doc Risk Assessment**
+**Playwright Tags for Selective Execution:**
-| NFR Category | Status | Evidence Status | Blocker | Next Action |
+```typescript
-|--------------|--------|-----------------|---------|-------------|
+import { test } from '@seontechnologies/playwright-utils/api-request/fixtures';
-| **Testability & Automation** | {Status} | {Evidence} | {Sprint} | {Action} |
+import { expect } from '@playwright/test';
 | **Test Data Strategy** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **Scalability & Availability** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **Disaster Recovery** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **Security** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **Monitorability, Debuggability & Manageability** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **QoS & QoE** | {Status} | {Evidence} | {Sprint} | {Action} |
 | **Deployability** | {Status} | {Evidence} | {Sprint} | {Action} |
-**Total:** {N} PASS, {N} CONCERNS across {N} categories
+// P0 critical test
 test('@P0 @API @Security unauthenticated request returns 401', async ({ apiRequest }) => {
  const { status, body } = await apiRequest({
    method: 'POST',
    path: '/api/endpoint',
    body: { data: 'test' },
    skipAuth: true,
  });
  expect(status).toBe(401);
  expect(body.error).toContain('unauthorized');
 });
 // P1 integration test
 test('@P1 @Integration data syncs correctly', async ({ apiRequest }) => {
  // Seed data
  await apiRequest({
    method: 'POST',
    path: '/api/seed',
    body: { /* test data */ },
  });
  // Validate
  const { status, body } = await apiRequest({
    method: 'GET',
    path: '/api/resource',
  });
  expect(status).toBe(200);
  expect(body).toHaveProperty('data');
 });
 ```
 **Run specific tags:**
 ```bash
 # Run only P0 tests
 npx playwright test --grep @P0
 # Run P0 + P1 tests
 npx playwright test --grep "@P0|@P1"
 # Run only security tests
 npx playwright test --grep @Security
 # Run all Playwright tests in PR (default)
 npx playwright test
 ```
 ---
-**End of QA Document**
+## Appendix B: Knowledge Base References
-**Next Steps for QA Team:**
+- **Risk Governance**: `risk-governance.md` - Risk scoring methodology
-1. Verify Sprint 0 blockers resolved (coordinate with Architecture team if not)
+- **Test Priorities Matrix**: `test-priorities-matrix.md` - P0-P3 criteria
-2. Set up test infrastructure (factories, fixtures, environments)
+- **Test Levels Framework**: `test-levels-framework.md` - E2E vs API vs Unit selection
-3. Begin test implementation following priority order (P0 → P1 → P2 → P3)
+- **Test Quality**: `test-quality.md` - Definition of Done (no hard waits, <300 lines, <1.5 min)
 4. Run smoke tests first for fast feedback
 5. Track progress using test scenario checklists above
-**Next Steps for Architecture Team:**
+---
-1. Monitor Sprint 0 blocker resolution
+
-2. Provide support for QA infrastructure setup if needed
+**Generated by:** BMad TEA Agent
-3. Review test results and address any newly discovered testability gaps
+**Workflow:** `_bmad/bmm/testarch/test-design`
 **Version:** 4.0 (BMad v6)
--- a/src/core/tasks/editorial-review-prose.xml
+++ b/src/core/tasks/editorial-review-prose.xml
@ -7,6 +7,10 @@
  <inputs>
    <input name="content" required="true" desc="Cohesive unit of text to review (markdown, plain text, or text-heavy XML)" />
    <input name="style_guide" required="false"
      desc="Project-specific style guide. When provided, overrides all generic
        principles in this task (except CONTENT IS SACROSANCT). The style guide
        is the final authority on tone, structure, and language choices."/>
    <input name="reader_type" required="false" default="humans" desc="'humans' (default) for standard editorial, 'llm' for precision focus" />
  </inputs>
@ -32,7 +36,11 @@
      <i>No conflicts: Merge overlapping fixes into single entries</i>
      <i>Respect author voice: Preserve intentional stylistic choices</i>
    </principles>
-
+    <i critical="true">STYLE GUIDE OVERRIDE: If a style_guide input is provided,
      it overrides ALL generic principles in this task (including the Microsoft
      Writing Style Guide baseline and reader_type-specific priorities). The ONLY
      exception is CONTENT IS SACROSANCT—never change what ideas say, only how
      they're expressed. When style guide conflicts with this task, style guide wins.</i>
  </llm>
  <flow>
@ -54,6 +62,7 @@
    </step>
    <step n="3" title="Editorial Review" critical="true">
      <action if="style_guide provided">Consult style_guide now and note its key requirements—these override default principles for this review</action>
      <action>Review all prose sections (skip code blocks, frontmatter, structural markup)</action>
      <action>Identify communication issues that impede comprehension</action>
      <action>For each issue, determine the minimal fix that achieves clarity</action>
--- a/src/core/tasks/editorial-review-structure.xml
+++ b/src/core/tasks/editorial-review-structure.xml
@ -11,6 +11,10 @@
  <inputs>
    <input name="content" required="true"
      desc="Document to review (markdown, plain text, or structured content)"/>
    <input name="style_guide" required="false"
      desc="Project-specific style guide. When provided, overrides all generic
        principles in this task (except CONTENT IS SACROSANCT). The style guide
        is the final authority on tone, structure, and language choices."/>
    <input name="purpose" required="false"
      desc="Document's intended purpose (e.g., 'quickstart tutorial',
        'API reference', 'conceptual overview')"/>
@ -41,6 +45,12 @@
      <i>Propose, don't execute: Output recommendations-user decides what to accept</i>
      <i critical="true">CONTENT IS SACROSANCT: Never challenge ideas—only optimize how they're organized.</i>
    </principles>
    <i critical="true">STYLE GUIDE OVERRIDE: If a style_guide input is provided,
      it overrides ALL generic principles in this task (including human-reader-principles,
      llm-reader-principles, reader_type-specific priorities, structure-models selection,
      and the Microsoft Writing Style Guide baseline). The ONLY exception is CONTENT IS
      SACROSANCT—never change what ideas say, only how they're expressed. When style
      guide conflicts with this task, style guide wins.</i>
    <human-reader-principles>
      <i>These elements serve human comprehension and engagement-preserve unless clearly wasteful:</i>
      <i>Visual aids: Diagrams, images, and flowcharts anchor understanding</i>
@ -122,6 +132,7 @@
      <action>Note reader_type and which principles apply (human-reader-principles or llm-reader-principles)</action>
    </step>
    <step n="3" title="Structural Analysis" critical="true">
      <action if="style_guide provided">Consult style_guide now and note its key requirements—these override default principles for this analysis</action>
      <action>Map the document structure: list each major section with its word count</action>
      <action>Evaluate structure against the selected model's primary rules
        (e.g., 'Does recommendation come first?' for Pyramid)</action>