From 48881f86a6e862024ad513e726ab6c5987f4b5cb Mon Sep 17 00:00:00 2001
From: Murat K Ozcan <34237651+muratkeremozcan@users.noreply.github.com>
Date: Fri, 23 Jan 2026 13:00:48 -0600
Subject: [PATCH] doc: test design refinements (#1382)

---
 .../testarch/test-design/checklist.md         | 204 ++++++---
 .../testarch/test-design/instructions.md      | 359 +++++++++++++--
 .../test-design-architecture-template.md      |  71 ++-
 .../test-design/test-design-qa-template.md    | 420 ++++++++----------
 4 files changed, 687 insertions(+), 367 deletions(-)

diff --git a/src/bmm/workflows/testarch/test-design/checklist.md b/src/bmm/workflows/testarch/test-design/checklist.md
index 3dadfbbb..8ed106ec 100644
--- a/src/bmm/workflows/testarch/test-design/checklist.md
+++ b/src/bmm/workflows/testarch/test-design/checklist.md
@@ -80,23 +80,29 @@
 - [ ] Owners assigned where applicable
 - [ ] No duplicate coverage (same behavior at multiple levels)
 
-### Execution Order
+### Execution Strategy
 
-- [ ] Smoke tests defined (<5 min target)
-- [ ] P0 tests listed (<10 min target)
-- [ ] P1 tests listed (<30 min target)
-- [ ] P2/P3 tests listed (<60 min target)
-- [ ] Order optimizes for fast feedback
+**CRITICAL: Keep execution strategy simple, avoid redundancy**
+
+- [ ] **Simple structure**: PR / Nightly / Weekly (NOT complex smoke/P0/P1/P2 tiers)
+- [ ] **PR execution**: All functional tests unless significant infrastructure overhead
+- [ ] **Nightly/Weekly**: Only performance, chaos, long-running, manual tests
+- [ ] **No redundancy**: Don't re-list all tests (already in coverage plan)
+- [ ] **Philosophy stated**: "Run everything in PRs if <15 min, defer only if expensive/long"
+- [ ] **Playwright parallelization noted**: 100s of tests in 10-15 min
 
 ### Resource Estimates
 
-- [ ] P0 hours calculated (count × 2 hours)
-- [ ] P1 hours calculated (count × 1 hour)
-- [ ] P2 hours calculated (count × 0.5 hours)
-- [ ] P3 hours calculated (count × 0.25 hours)
-- [ ] Total hours summed
-- [ ] Days estimate provided (hours / 8)
-- [ ] Estimates include setup time
+**CRITICAL: Use intervals/ranges, NOT exact numbers**
+
+- [ ] P0 effort provided as interval range (e.g., "~25-40 hours" NOT "36 hours")
+- [ ] P1 effort provided as interval range (e.g., "~20-35 hours" NOT "27 hours")
+- [ ] P2 effort provided as interval range (e.g., "~10-30 hours" NOT "15.5 hours")
+- [ ] P3 effort provided as interval range (e.g., "~2-5 hours" NOT "2.5 hours")
+- [ ] Total effort provided as interval range (e.g., "~55-110 hours" NOT "81 hours")
+- [ ] Timeline provided as week range (e.g., "~1.5-3 weeks" NOT "11 days")
+- [ ] Estimates include setup time and account for complexity variations
+- [ ] **No false precision**: Avoid exact calculations like "18 tests × 2 hours = 36 hours"
 
 ### Quality Gate Criteria
 
@@ -126,11 +132,16 @@
 
 ### Priority Assignment Accuracy
 
-- [ ] P0: Truly blocks core functionality
-- [ ] P0: High-risk (score ≥6)
-- [ ] P0: No workaround exists
-- [ ] P1: Important but not blocking
-- [ ] P2/P3: Nice-to-have or edge cases
+**CRITICAL: Priority classification is separate from execution timing**
+
+- [ ] **Priority sections (P0/P1/P2/P3) do NOT include execution context** (e.g., no "Run on every commit" in headers)
+- [ ] **Priority sections have only "Criteria" and "Purpose"** (no "Execution:" field)
+- [ ] **Execution Strategy section** is separate and handles timing based on infrastructure overhead
+- [ ] P0: Truly blocks core functionality + High-risk (≥6) + No workaround
+- [ ] P1: Important features + Medium-risk (3-4) + Common workflows
+- [ ] P2: Secondary features + Low-risk (1-2) + Edge cases
+- [ ] P3: Nice-to-have + Exploratory + Benchmarks
+- [ ] **Note at top of Test Coverage Plan**: Clarifies P0/P1/P2/P3 = priority/risk, NOT execution timing
 
 ### Test Level Selection
 
@@ -176,58 +187,90 @@
   - [ ] 🚨 BLOCKERS - Team Must Decide (Sprint 0 critical path items)
   - [ ] ⚠️ HIGH PRIORITY - Team Should Validate (recommendations for approval)
   - [ ] 📋 INFO ONLY - Solutions Provided (no decisions needed)
-- [ ] **Risk Assessment** section
+- [ ] **Risk Assessment** section - **ACTIONABLE**
   - [ ] Total risks identified count
   - [ ] High-priority risks table (score ≥6) with all columns: Risk ID, Category, Description, Probability, Impact, Score, Mitigation, Owner, Timeline
   - [ ] Medium and low-priority risks tables
   - [ ] Risk category legend included
-- [ ] **Testability Concerns** section (if system has architectural constraints)
-  - [ ] Blockers to fast feedback table
-  - [ ] Explanation of why standard CI/CD may not apply (if applicable)
-  - [ ] Tiered testing strategy table (if forced by architecture)
-  - [ ] Architectural improvements needed (or acknowledgment system supports testing well)
+- [ ] **Testability Concerns and Architectural Gaps** section - **ACTIONABLE**
+  - [ ] **Sub-section: 🚨 ACTIONABLE CONCERNS** at TOP
+    - [ ] Blockers to Fast Feedback table (WHAT architecture must provide)
+    - [ ] Architectural Improvements Needed (WHAT must be changed)
+    - [ ] Each concern has: Owner, Timeline, Impact
+  - [ ] **Sub-section: Testability Assessment Summary** at BOTTOM (FYI)
+    - [ ] What Works Well (passing items)
+    - [ ] Accepted Trade-offs (no action required)
+    - [ ] This section only included if worth mentioning; otherwise omitted
 - [ ] **Risk Mitigation Plans** for all high-priority risks (≥6)
   - [ ] Each plan has: Strategy (numbered steps), Owner, Timeline, Status, Verification
+  - [ ] **Only Backend/DevOps/Arch/Security mitigations** (production code changes)
+  - [ ] QA-owned mitigations belong in QA doc instead
 - [ ] **Assumptions and Dependencies** section
+  - [ ] **Architectural assumptions only** (SLO targets, replication lag, system design)
   - [ ] Assumptions list (numbered)
   - [ ] Dependencies list with required dates
   - [ ] Risks to plan with impact and contingency
+  - [ ] QA execution assumptions belong in QA doc instead
 - [ ] **NO test implementation code** (long examples belong in QA doc)
+- [ ] **NO test scripts** (no Playwright test(...) blocks, no assertions, no test setup code)
+- [ ] **NO NFR test examples** (NFR sections describe WHAT to test, not HOW to test)
 - [ ] **NO test scenario checklists** (belong in QA doc)
-- [ ] **Cross-references to QA doc** where appropriate
+- [ ] **NO bloat or repetition** (consolidate repeated notes, avoid over-explanation)
+- [ ] **Cross-references to QA doc** where appropriate (instead of duplication)
+- [ ] **RECIPE SECTIONS NOT IN ARCHITECTURE DOC:**
+  - [ ] NO "Test Levels Strategy" section (unit/integration/E2E split belongs in QA doc only)
+  - [ ] NO "NFR Testing Approach" section with detailed test procedures (belongs in QA doc only)
+  - [ ] NO "Test Environment Requirements" section (belongs in QA doc only)
+  - [ ] NO "Recommendations for Sprint 0" section with test framework setup (belongs in QA doc only)
+  - [ ] NO "Quality Gate Criteria" section (pass rates, coverage targets belong in QA doc only)
+  - [ ] NO "Tool Selection" section (Playwright, k6, etc. belongs in QA doc only)
 
 ### test-design-qa.md
 
-- [ ] **Purpose statement** at top (execution recipe for QA team)
-- [ ] **Quick Reference for QA** section
-  - [ ] Before You Start checklist
-  - [ ] Test Execution Order
-  - [ ] Need Help? guidance
-- [ ] **System Architecture Summary** (brief overview of services and data flow)
-- [ ] **Test Environment Requirements** in early section (section 1-3, NOT buried at end)
-  - [ ] Table with Local/Dev/Staging environments
-  - [ ] Key principles listed (shared DB, randomization, parallel-safe, self-cleaning, shift-left)
-  - [ ] Code example provided
-- [ ] **Testability Assessment** with prerequisites checklist
-  - [ ] References Architecture doc blockers (not duplication)
-- [ ] **Test Levels Strategy** with unit/integration/E2E split
-  - [ ] System type identified
-  - [ ] Recommended split percentages with rationale
-  - [ ] Test count summary (P0/P1/P2/P3 totals)
+**NEW STRUCTURE (streamlined from 375 to ~287 lines):**
+
+- [ ] **Purpose statement** at top (test execution recipe)
+- [ ] **Executive Summary** with risk summary and coverage summary
+- [ ] **Dependencies & Test Blockers** section in POSITION 2 (right after Executive Summary)
+  - [ ] Backend/Architecture dependencies listed (what QA needs from other teams)
+  - [ ] QA infrastructure setup listed (factories, fixtures, environments)
+  - [ ] Code example with playwright-utils if config.tea_use_playwright_utils is true
+  - [ ] Test from '@seontechnologies/playwright-utils/api-request/fixtures'
+  - [ ] Expect from '@playwright/test' (playwright-utils does not re-export expect)
+  - [ ] Code examples include assertions (no unused imports)
+- [ ] **Risk Assessment** section (brief, references Architecture doc)
+  - [ ] High-priority risks table
+  - [ ] Medium/low-priority risks table
+  - [ ] Each risk shows "QA Test Coverage" column (how QA validates)
 - [ ] **Test Coverage Plan** with P0/P1/P2/P3 sections
-  - [ ] Each priority has: Execution details, Purpose, Criteria, Test Count
-  - [ ] Detailed test scenarios WITH CHECKBOXES
-  - [ ] Coverage table with columns: Requirement | Test Level | Risk Link | Test Count | Owner | Notes
-- [ ] **Sprint 0 Setup Requirements**
-  - [ ] Architecture/Backend blockers listed with cross-references to Architecture doc
-  - [ ] QA Test Infrastructure section (factories, fixtures)
-  - [ ] Test Environments section (Local, CI/CD, Staging, Production)
-  - [ ] Sprint 0 NFR Gates checklist
-  - [ ] Sprint 1 Items clearly separated
-- [ ] **NFR Readiness Summary** (reference to Architecture doc, not duplication)
-  - [ ] Table with NFR categories, status, evidence, blocker, next action
-- [ ] **Cross-references to Architecture doc** (not duplication)
-- [ ] **NO architectural theory** (just reference Architecture doc)
+  - [ ] Priority sections have ONLY "Criteria" (no execution context)
+  - [ ] Note at top: "P0/P1/P2/P3 = priority, NOT execution timing"
+  - [ ] Test tables with columns: Test ID | Requirement | Test Level | Risk Link | Notes
+- [ ] **Execution Strategy** section (organized by TOOL TYPE)
+  - [ ] Every PR: Playwright tests (~10-15 min)
+  - [ ] Nightly: k6 performance tests (~30-60 min)
+  - [ ] Weekly: Chaos & long-running (~hours)
+  - [ ] Philosophy: "Run everything in PRs unless expensive/long-running"
+- [ ] **QA Effort Estimate** section (QA effort ONLY)
+  - [ ] Interval-based estimates (e.g., "~1-2 weeks" NOT "36 hours")
+  - [ ] NO DevOps, Backend, Data Eng, Finance effort
+  - [ ] NO Sprint breakdowns (too prescriptive)
+- [ ] **Appendix A: Code Examples & Tagging**
+- [ ] **Appendix B: Knowledge Base References**
+
+**REMOVED SECTIONS (bloat):**
+- [ ] ❌ NO Quick Reference section (bloat)
+- [ ] ❌ NO System Architecture Summary (bloat)
+- [ ] ❌ NO Test Environment Requirements as separate section (integrated into Dependencies)
+- [ ] ❌ NO Testability Assessment section (bloat - covered in Dependencies)
+- [ ] ❌ NO Test Levels Strategy section (bloat - obvious from test scenarios)
+- [ ] ❌ NO NFR Readiness Summary (bloat)
+- [ ] ❌ NO Quality Gate Criteria section (teams decide for themselves)
+- [ ] ❌ NO Follow-on Workflows section (bloat - BMAD commands self-explanatory)
+- [ ] ❌ NO Approval section (unnecessary formality)
+- [ ] ❌ NO Infrastructure/DevOps/Finance effort tables (out of scope)
+- [ ] ❌ NO Sprint 0/1/2/3 breakdown tables (too prescriptive)
+- [ ] ❌ NO Next Steps section (bloat)
 
 ### Cross-Document Consistency
 
@@ -238,6 +281,40 @@
 - [ ] Dates and authors match across documents
 - [ ] ADR and PRD references consistent
 
+### Document Quality (Anti-Bloat Check)
+
+**CRITICAL: Check for bloat and repetition across BOTH documents**
+
+- [ ] **No repeated notes 10+ times** (e.g., "Timing is pessimistic until R-005 fixed" on every section)
+- [ ] **Repeated information consolidated** (write once at top, reference briefly if needed)
+- [ ] **No excessive detail** that doesn't add value (obvious concepts, redundant examples)
+- [ ] **Focus on unique/critical info** (only document what's different from standard practice)
+- [ ] **Architecture doc**: Concerns-focused, NOT implementation-focused
+- [ ] **QA doc**: Implementation-focused, NOT theory-focused
+- [ ] **Clear separation**: Architecture = WHAT and WHY, QA = HOW
+- [ ] **Professional tone**: No AI slop markers
+  - [ ] Avoid excessive ✅/❌ emojis (use sparingly, only when adding clarity)
+  - [ ] Avoid "absolutely", "excellent", "fantastic", overly enthusiastic language
+  - [ ] Write professionally and directly
+- [ ] **Architecture doc length**: Target ~150-200 lines max (focus on actionable concerns only)
+- [ ] **QA doc length**: Keep concise, remove bloat sections
+
+### Architecture Doc Structure (Actionable-First Principle)
+
+**CRITICAL: Validate structure follows actionable-first, FYI-last principle**
+
+- [ ] **Actionable sections at TOP:**
+  - [ ] Quick Guide (🚨 BLOCKERS first, then ⚠️ HIGH PRIORITY, then 📋 INFO ONLY last)
+  - [ ] Risk Assessment (high-priority risks ≥6 at top)
+  - [ ] Testability Concerns (concerns/blockers at top, passing items at bottom)
+  - [ ] Risk Mitigation Plans (for high-priority risks ≥6)
+- [ ] **FYI sections at BOTTOM:**
+  - [ ] Testability Assessment Summary (what works well - only if worth mentioning)
+  - [ ] Assumptions and Dependencies
+- [ ] **ASRs categorized correctly:**
+  - [ ] Actionable ASRs included in 🚨 or ⚠️ sections
+  - [ ] FYI ASRs included in 📋 section or omitted if obvious
+
 ## Completion Criteria
 
 **All must be true:**
@@ -295,9 +372,20 @@ If workflow fails:
 
 - **Solution**: Use test pyramid - E2E for critical paths only
 
-**Issue**: Resource estimates too high
+**Issue**: Resource estimates too high or too precise
 
-- **Solution**: Invest in fixtures/factories to reduce per-test setup time
+- **Solution**:
+  - Invest in fixtures/factories to reduce per-test setup time
+  - Use interval ranges (e.g., "~55-110 hours") instead of exact numbers (e.g., "81 hours")
+  - Widen intervals if high uncertainty exists
+
+**Issue**: Execution order section too complex or redundant
+
+- **Solution**:
+  - Default: Run everything in PRs (<15 min with Playwright parallelization)
+  - Only defer to nightly/weekly if expensive (k6, chaos, 4+ hour tests)
+  - Don't create smoke/P0/P1/P2/P3 tier structure
+  - Don't re-list all tests (already in coverage plan)
 
 ### Best Practices
 
@@ -305,7 +393,9 @@ If workflow fails:
 - High-priority risks (≥6) require immediate mitigation
 - P0 tests should cover <10% of total scenarios
 - Avoid testing same behavior at multiple levels
-- Include smoke tests (P0 subset) for fast feedback
+- **Use interval-based estimates** (e.g., "~25-40 hours") instead of exact numbers to avoid false precision and provide flexibility
+- **Keep execution strategy simple**: Default to "run everything in PRs" (<15 min with Playwright), only defer if expensive/long-running
+- **Avoid execution order redundancy**: Don't create complex tier structures or re-list tests
 
 ---
 
diff --git a/src/bmm/workflows/testarch/test-design/instructions.md b/src/bmm/workflows/testarch/test-design/instructions.md
index fbee3103..1eae05be 100644
--- a/src/bmm/workflows/testarch/test-design/instructions.md
+++ b/src/bmm/workflows/testarch/test-design/instructions.md
@@ -157,7 +157,13 @@ TEA test-design workflow supports TWO modes, detected automatically:
 
 1. **Review Architecture for Testability**
 
-   Evaluate architecture against these criteria:
+   **STRUCTURE PRINCIPLE: CONCERNS FIRST, PASSING ITEMS LAST**
+
+   Evaluate architecture against these criteria and structure output as:
+   1. **Testability Concerns** (ACTIONABLE - what's broken/missing)
+   2. **Testability Assessment Summary** (FYI - what works well)
+
+   **Testability Criteria:**
 
    **Controllability:**
    - Can we control system state for testing? (API seeding, factories, database reset)
@@ -174,8 +180,18 @@ TEA test-design workflow supports TWO modes, detected automatically:
    - Can we reproduce failures? (deterministic waits, HAR capture, seed data)
    - Are components loosely coupled? (mockable, testable boundaries)
 
+   **In Architecture Doc Output:**
+   - **Section A: Testability Concerns** (TOP) - List what's BROKEN or MISSING
+     - Example: "No API for test data seeding → Cannot parallelize tests"
+     - Example: "Hardcoded DB connection → Cannot test in CI"
+   - **Section B: Testability Assessment Summary** (BOTTOM) - List what PASSES
+     - Example: "✅ API-first design supports test isolation"
+     - Only include if worth mentioning; otherwise omit this section entirely
+
 2. **Identify Architecturally Significant Requirements (ASRs)**
 
+   **CRITICAL: ASRs must indicate if ACTIONABLE or FYI**
+
    From PRD NFRs and architecture decisions, identify quality requirements that:
    - Drive architecture decisions (e.g., "Must handle 10K concurrent users" → caching architecture)
    - Pose testability challenges (e.g., "Sub-second response time" → performance test infrastructure)
@@ -183,21 +199,60 @@ TEA test-design workflow supports TWO modes, detected automatically:
 
    Score each ASR using risk matrix (probability × impact).
 
+   **In Architecture Doc, categorize ASRs:**
+   - **ACTIONABLE ASRs** (require architecture changes): Include in "Quick Guide" 🚨 or ⚠️ sections
+   - **FYI ASRs** (already satisfied by architecture): Include in "Quick Guide" 📋 section OR omit if obvious
+
+   **Example:**
+   - ASR-001 (Score 9): "Multi-region deployment requires region-specific test infrastructure" → **ACTIONABLE** (goes in 🚨 BLOCKERS)
+   - ASR-002 (Score 4): "OAuth 2.1 authentication already implemented in ADR-5" → **FYI** (goes in 📋 INFO ONLY or omit)
+
+   **Structure Principle:** Actionable ASRs at TOP, FYI ASRs at BOTTOM (or omit)
+
 3. **Define Test Levels Strategy**
 
+   **IMPORTANT: This section goes in QA doc ONLY, NOT in Architecture doc**
+
    Based on architecture (mobile, web, API, microservices, monolith):
    - Recommend unit/integration/E2E split (e.g., 70/20/10 for API-heavy, 40/30/30 for UI-heavy)
    - Identify test environment needs (local, staging, ephemeral, production-like)
    - Define testing approach per technology (Playwright for web, Maestro for mobile, k6 for performance)
 
-4. **Assess NFR Testing Approach**
+   **In Architecture doc:** Only mention test level split if it's an ACTIONABLE concern
+   - Example: "API response time <100ms requires load testing infrastructure" (concern)
+   - DO NOT include full test level strategy table in Architecture doc
 
-   For each NFR category:
-   - **Security**: Auth/authz tests, OWASP validation, secret handling (Playwright E2E + security tools)
-   - **Performance**: Load/stress/spike testing with k6, SLO/SLA thresholds
-   - **Reliability**: Error handling, retries, circuit breakers, health checks (Playwright + API tests)
+4. **Assess NFR Requirements (MINIMAL in Architecture Doc)**
+
+   **CRITICAL: NFR testing approach is a RECIPE - belongs in QA doc ONLY**
+
+   **In Architecture Doc:**
+   - Only mention NFRs if they create testability CONCERNS
+   - Focus on WHAT architecture must provide, not HOW to test
+   - Keep it brief - 1-2 sentences per NFR category at most
+
+   **Example - Security NFR in Architecture doc (if there's a concern):**
+   ✅ CORRECT (concern-focused, brief, WHAT/WHY only):
+   - "System must prevent cross-customer data access (GDPR requirement). Requires test infrastructure for multi-tenant isolation in Sprint 0."
+   - "OAuth tokens must expire after 1 hour (ADR-5). Requires test harness for token expiration validation."
+
+   ❌ INCORRECT (too detailed, belongs in QA doc):
+   - Full table of security test scenarios
+   - Test scripts with code examples
+   - Detailed test procedures
+   - Tool selection (e.g., "use Playwright E2E + OWASP ZAP")
+   - Specific test approaches (e.g., "Test approach: Playwright E2E for auth/authz")
+
+   **In QA Doc (full NFR testing approach):**
+   - **Security**: Full test scenarios, tooling (Playwright + OWASP ZAP), test procedures
+   - **Performance**: Load/stress/spike test scenarios, k6 scripts, SLO thresholds
+   - **Reliability**: Error handling tests, retry logic validation, circuit breaker tests
    - **Maintainability**: Coverage targets, code quality gates, observability validation
 
+   **Rule of Thumb:**
+   - Architecture doc: "What NFRs exist and what concerns they create" (1-2 sentences)
+   - QA doc: "How to test those NFRs" (full sections with tables, code, procedures)
+
 5. **Flag Testability Concerns**
 
    Identify architecture decisions that harm testability:
@@ -228,22 +283,54 @@ TEA test-design workflow supports TWO modes, detected automatically:
    **Standard Structures (REQUIRED):**
 
    **test-design-architecture.md sections (in this order):**
+
+   **STRUCTURE PRINCIPLE: Actionable items FIRST, FYI items LAST**
+
    1. Executive Summary (scope, business context, architecture, risk summary)
    2. Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY)
-   3. Risk Assessment (high/medium/low-priority risks with scoring)
-   4. Testability Concerns and Architectural Gaps (if system has constraints)
-   5. Risk Mitigation Plans (detailed for high-priority risks ≥6)
-   6. Assumptions and Dependencies
+   3. Risk Assessment (high/medium/low-priority risks with scoring) - **ACTIONABLE**
+   4. Testability Concerns and Architectural Gaps - **ACTIONABLE** (what arch team must do)
+      - Sub-section: Blockers to Fast Feedback (ACTIONABLE - concerns FIRST)
+      - Sub-section: Architectural Improvements Needed (ACTIONABLE)
+      - Sub-section: Testability Assessment Summary (FYI - passing items LAST, only if worth mentioning)
+   5. Risk Mitigation Plans (detailed for high-priority risks ≥6) - **ACTIONABLE**
+   6. Assumptions and Dependencies - **FYI**
+
+   **SECTIONS THAT DO NOT BELONG IN ARCHITECTURE DOC:**
+   - ❌ Test Levels Strategy (unit/integration/E2E split) - This is a RECIPE, belongs in QA doc ONLY
+   - ❌ NFR Testing Approach with test examples - This is a RECIPE, belongs in QA doc ONLY
+   - ❌ Test Environment Requirements - This is a RECIPE, belongs in QA doc ONLY
+   - ❌ Recommendations for Sprint 0 (test framework setup, factories) - This is a RECIPE, belongs in QA doc ONLY
+   - ❌ Quality Gate Criteria (pass rates, coverage targets) - This is a RECIPE, belongs in QA doc ONLY
+   - ❌ Tool Selection (Playwright, k6, etc.) - This is a RECIPE, belongs in QA doc ONLY
+
+   **WHAT BELONGS IN ARCHITECTURE DOC:**
+   - ✅ Testability CONCERNS (what makes it hard to test)
+   - ✅ Architecture GAPS (what's missing for testability)
+   - ✅ What architecture team must DO (blockers, improvements)
+   - ✅ Risks and mitigation plans
+   - ✅ ASRs (Architecturally Significant Requirements) - but clarify if FYI or actionable
 
    **test-design-qa.md sections (in this order):**
-   1. Quick Reference for QA (Before You Start, Execution Order, Need Help)
-   2. System Architecture Summary (brief overview)
-   3. Test Environment Requirements (MOVE UP - section 3, NOT buried at end)
-   4. Testability Assessment (lightweight prerequisites checklist)
-   5. Test Levels Strategy (unit/integration/E2E split with rationale)
-   6. Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
-   7. Sprint 0 Setup Requirements (blockers, infrastructure, environments)
-   8. NFR Readiness Summary (reference to Architecture doc)
+   1. Executive Summary (risk summary, coverage summary)
+   2. **Dependencies & Test Blockers** (CRITICAL: RIGHT AFTER SUMMARY - what QA needs from other teams)
+   3. Risk Assessment (scored risks with categories - reference Arch doc, don't duplicate)
+   4. Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
+   5. **Execution Strategy** (SIMPLE: Organized by TOOL TYPE: PR (Playwright) / Nightly (k6) / Weekly (chaos/manual))
+   6. QA Effort Estimate (QA effort ONLY - no DevOps, Data Eng, Finance, Backend)
+   7. Appendices (code examples with playwright-utils, tagging strategy, knowledge base refs)
+
+   **SECTIONS TO EXCLUDE FROM QA DOC:**
+   - ❌ Quality Gate Criteria (pass/fail thresholds - teams decide for themselves)
+   - ❌ Follow-on Workflows (bloat - BMAD commands are self-explanatory)
+   - ❌ Approval section (unnecessary formality)
+   - ❌ Test Environment Requirements (remove as separate section - integrate into Dependencies if needed)
+   - ❌ NFR Readiness Summary (bloat - covered in Risk Assessment)
+   - ❌ Testability Assessment (bloat - covered in Dependencies)
+   - ❌ Test Levels Strategy (bloat - obvious from test scenarios)
+   - ❌ Sprint breakdowns (too prescriptive)
+   - ❌ Infrastructure/DevOps/Data Eng effort tables (out of scope)
+   - ❌ Mitigation plans for non-QA work (belongs in Arch doc)
 
    **Content Guidelines:**
 
@@ -252,26 +339,46 @@ TEA test-design workflow supports TWO modes, detected automatically:
    - ✅ Clear ownership (each blocker/ASR has owner + timeline)
    - ✅ Testability requirements (what architecture must support)
    - ✅ Mitigation plans (for each high-risk item ≥6)
-   - ✅ Short code examples (5-10 lines max showing what to support)
+   - ✅ Brief conceptual examples ONLY if needed to clarify architecture concerns (5-10 lines max)
+   - ✅ **Target length**: ~150-200 lines max (focus on actionable concerns only)
+   - ✅ **Professional tone**: Avoid AI slop (excessive ✅/❌ emojis, "absolutely", "excellent", overly enthusiastic language)
 
-   **Architecture doc (DON'T):**
-   - ❌ NO long test code examples (belongs in QA doc)
-   - ❌ NO test scenario checklists (belongs in QA doc)
-   - ❌ NO implementation details (how QA will test)
+   **Architecture doc (DON'T) - CRITICAL:**
+   - ❌ NO test scripts or test implementation code AT ALL - This is a communication doc for architects, not a testing guide
+   - ❌ NO Playwright test examples (e.g., test('...', async ({ request }) => ...))
+   - ❌ NO assertion logic (e.g., expect(...).toBe(...))
+   - ❌ NO test scenario checklists with checkboxes (belongs in QA doc)
+   - ❌ NO implementation details about HOW QA will test
+   - ❌ Focus on CONCERNS, not IMPLEMENTATION
 
    **QA doc (DO):**
    - ✅ Test scenario recipes (clear P0/P1/P2/P3 with checkboxes)
-   - ✅ Environment setup (Sprint 0 checklist with blockers)
-   - ✅ Tool setup (factories, fixtures, frameworks)
+   - ✅ Full test implementation code samples when helpful
+   - ✅ **IMPORTANT: If config.tea_use_playwright_utils is true, ALL code samples MUST use @seontechnologies/playwright-utils fixtures and utilities**
+   - ✅ Import test fixtures from '@seontechnologies/playwright-utils/api-request/fixtures'
+   - ✅ Import expect from '@playwright/test' (playwright-utils does not re-export expect)
+   - ✅ Use apiRequest fixture with schema validation, retry logic, and structured responses
+   - ✅ Dependencies & Test Blockers section RIGHT AFTER Executive Summary (what QA needs from other teams)
+   - ✅ **QA effort estimates ONLY** (no DevOps, Data Eng, Finance, Backend effort - out of scope)
    - ✅ Cross-references to Architecture doc (not duplication)
+   - ✅ **Professional tone**: Avoid AI slop (excessive ✅/❌ emojis, "absolutely", "excellent", overly enthusiastic language)
 
    **QA doc (DON'T):**
    - ❌ NO architectural theory (just reference Architecture doc)
    - ❌ NO ASR explanations (link to Architecture doc instead)
    - ❌ NO duplicate risk assessments (reference Architecture doc)
+   - ❌ NO Quality Gate Criteria section (teams decide pass/fail thresholds for themselves)
+   - ❌ NO Follow-on Workflows section (bloat - BMAD commands are self-explanatory)
+   - ❌ NO Approval section (unnecessary formality)
+   - ❌ NO effort estimates for other teams (DevOps, Backend, Data Eng, Finance - out of scope, QA effort only)
+   - ❌ NO Sprint breakdowns (too prescriptive - e.g., "Sprint 0: 40 hours, Sprint 1: 48 hours")
+   - ❌ NO mitigation plans for Backend/Arch/DevOps work (those belong in Architecture doc)
+   - ❌ NO architectural assumptions or debates (those belong in Architecture doc)
 
    **Anti-Patterns to Avoid (Cross-Document Redundancy):**
 
+   **CRITICAL: NO BLOAT, NO REPETITION, NO OVERINFO**
+
    ❌ **DON'T duplicate OAuth requirements:**
    - Architecture doc: Explain OAuth 2.1 flow in detail
    - QA doc: Re-explain why OAuth 2.1 is required
@@ -280,6 +387,24 @@ TEA test-design workflow supports TWO modes, detected automatically:
    - Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)"
    - QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)"
 
+   ❌ **DON'T repeat the same note 10+ times:**
+   - Example: "Timing is pessimistic until R-005 is fixed" repeated on every P0, P1, P2 section
+   - This creates bloat and makes docs hard to read
+
+   ✅ **DO consolidate repeated information:**
+   - Write once at the top: "**Note**: All timing estimates are pessimistic pending R-005 resolution"
+   - Reference briefly if needed: "(pessimistic timing)"
+
+   ❌ **DON'T include excessive detail that doesn't add value:**
+   - Long explanations of obvious concepts
+   - Redundant examples showing the same pattern
+   - Over-documentation of standard practices
+
+   ✅ **DO focus on what's unique or critical:**
+   - Document only what's different from standard practice
+   - Highlight critical decisions and risks
+   - Keep explanations concise and actionable
+
    **Markdown Cross-Reference Syntax Examples:**
 
    ```markdown
@@ -330,6 +455,24 @@ TEA test-design workflow supports TWO modes, detected automatically:
    - Cross-reference between docs (no duplication)
    - Validate against checklist.md (System-Level Mode section)
 
+**Common Over-Engineering to Avoid:**
+
+   **In QA Doc:**
+   1. ❌ Quality gate thresholds ("P0 must be 100%, P1 ≥95%") - Let teams decide for themselves
+   2. ❌ Effort estimates for other teams - QA doc should only estimate QA effort
+   3. ❌ Sprint breakdowns ("Sprint 0: 40 hours, Sprint 1: 48 hours") - Too prescriptive
+   4. ❌ Approval sections - Unnecessary formality
+   5. ❌ Assumptions about architecture (SLO targets, replication lag) - These are architectural concerns, belong in Arch doc
+   6. ❌ Mitigation plans for Backend/Arch/DevOps - Those belong in Arch doc
+   7. ❌ Follow-on workflows section - Bloat, BMAD commands are self-explanatory
+   8. ❌ NFR Readiness Summary - Bloat, covered in Risk Assessment
+
+   **Test Coverage Numbers Reality Check:**
+   - With Playwright parallelization, running ALL Playwright tests is as fast as running just P0
+   - Don't split Playwright tests by priority into different CI gates - it adds no value
+   - Tool type matters, not priority labels
+   - Defer based on infrastructure cost, not importance
+
 **After System-Level Mode:** Workflow COMPLETE. System-level outputs (test-design-architecture.md + test-design-qa.md) are written in this step. Steps 2-4 are epic-level only - do NOT execute them in system-level mode.
 
 ---
@@ -540,12 +683,51 @@ TEA test-design workflow supports TWO modes, detected automatically:
 
 8. **Plan Mitigations**
 
+   **CRITICAL: Mitigation placement depends on WHO does the work**
+
    For each high-priority risk:
    - Define mitigation strategy
    - Assign owner (dev, QA, ops)
    - Set timeline
    - Update residual risk expectation
 
+   **Mitigation Plan Placement:**
+
+   **Architecture Doc:**
+   - Mitigations owned by Backend, DevOps, Architecture, Security, Data Eng
+   - Example: "Add authorization layer for customer-scoped access" (Backend work)
+   - Example: "Configure AWS Fault Injection Simulator" (DevOps work)
+   - Example: "Define CloudWatch log schema for backfill events" (Architecture work)
+
+   **QA Doc:**
+   - Mitigations owned by QA (test development work)
+   - Example: "Create factories for test data with randomization" (QA work)
+   - Example: "Implement polling with retry for async validation" (QA test code)
+   - Brief reference to Architecture doc mitigations (don't duplicate)
+
+   **Rule of Thumb:**
+   - If mitigation requires production code changes → Architecture doc
+   - If mitigation is test infrastructure/code → QA doc
+   - If mitigation involves multiple teams → Architecture doc with QA validation approach
+
+   **Assumptions Placement:**
+
+   **Architecture Doc:**
+   - Architectural assumptions (SLO targets, replication lag, system design assumptions)
+   - Example: "P95 <500ms inferred from <2s timeout (requires Product approval)"
+   - Example: "Multi-region replication lag <1s assumed (ADR doesn't specify SLA)"
+   - Example: "Recent Cache hit ratio >80% assumed (not in PRD/ADR)"
+
+   **QA Doc:**
+   - Test execution assumptions (test infrastructure readiness, test data availability)
+   - Example: "Assumes test factories already created"
+   - Example: "Assumes CI/CD pipeline configured"
+   - Brief reference to Architecture doc for architectural assumptions
+
+   **Rule of Thumb:**
+   - If assumption is about system architecture/design → Architecture doc
+   - If assumption is about test infrastructure/execution → QA doc
+
 ---
 
 ## Step 3: Design Test Coverage
@@ -594,6 +776,8 @@ TEA test-design workflow supports TWO modes, detected automatically:
 
 3. **Assign Priority Levels**
 
+   **CRITICAL: P0/P1/P2/P3 indicates priority and risk level, NOT execution timing**
+
    **Knowledge Base Reference**: `test-priorities-matrix.md`
 
    **P0 (Critical)**:
@@ -601,25 +785,28 @@ TEA test-design workflow supports TWO modes, detected automatically:
    - High-risk areas (score ≥6)
    - Revenue-impacting
    - Security-critical
-   - **Run on every commit**
+   - No workaround exists
+   - Affects majority of users
 
    **P1 (High)**:
    - Important user features
    - Medium-risk areas (score 3-4)
    - Common workflows
-   - **Run on PR to main**
+   - Workaround exists but difficult
 
    **P2 (Medium)**:
    - Secondary features
    - Low-risk areas (score 1-2)
    - Edge cases
-   - **Run nightly or weekly**
+   - Regression prevention
 
    **P3 (Low)**:
    - Nice-to-have
    - Exploratory
    - Performance benchmarks
-   - **Run on-demand**
+   - Documentation validation
+
+   **NOTE:** Priority classification is separate from execution timing. A P1 test might run in PRs if it's fast, or nightly if it requires expensive infrastructure (e.g., k6 performance test). See "Execution Strategy" section for timing guidance.
 
 4. **Outline Data and Tooling Prerequisites**
 
@@ -629,13 +816,55 @@ TEA test-design workflow supports TWO modes, detected automatically:
    - Environment setup
    - Tools and dependencies
 
-5. **Define Execution Order**
+5. **Define Execution Strategy** (Keep It Simple)
 
-   Recommend test execution sequence:
-   1. **Smoke tests** (P0 subset, <5 min)
-   2. **P0 tests** (critical paths, <10 min)
-   3. **P1 tests** (important features, <30 min)
-   4. **P2/P3 tests** (full regression, <60 min)
+   **IMPORTANT: Avoid over-engineering execution order**
+
+   **Default Philosophy:**
+   - Run **everything** in PRs if total duration <15 minutes
+   - Playwright is fast with parallelization (100s of tests in ~10-15 min)
+   - Only defer to nightly/weekly if there's significant overhead:
+     - Performance tests (k6, load testing) - expensive infrastructure
+     - Chaos engineering - requires special setup (AWS FIS)
+     - Long-running tests - endurance (4+ hours), disaster recovery
+     - Manual tests - require human intervention
+
+   **Simple Execution Strategy (Organized by TOOL TYPE):**
+
+   ```markdown
+   ## Execution Strategy
+
+   **Philosophy**: Run everything in PRs unless significant infrastructure overhead.
+   Playwright with parallelization is extremely fast (100s of tests in ~10-15 min).
+
+   **Organized by TOOL TYPE:**
+
+   ### Every PR: Playwright Tests (~10-15 min)
+   All functional tests (from any priority level):
+   - All E2E, API, integration, unit tests using Playwright
+   - Parallelized across {N} shards
+   - Total: ~{N} tests (includes P0, P1, P2, P3)
+
+   ### Nightly: k6 Performance Tests (~30-60 min)
+   All performance tests (from any priority level):
+   - Load, stress, spike, endurance
+   - Reason: Expensive infrastructure, long-running (10-40 min per test)
+
+   ### Weekly: Chaos & Long-Running (~hours)
+   Special infrastructure tests (from any priority level):
+   - Multi-region failover, disaster recovery, endurance
+   - Reason: Very expensive, very long (4+ hours)
+   ```
+
+   **KEY INSIGHT: Organize by TOOL TYPE, not priority**
+   - Playwright (fast, cheap) → PR
+   - k6 (expensive, long) → Nightly
+   - Chaos/Manual (very expensive, very long) → Weekly
+
+   **Avoid:**
+   - ❌ Don't organize by priority (smoke → P0 → P1 → P2 → P3)
+   - ❌ Don't say "P1 runs on PR to main" (some P1 are Playwright/PR, some are k6/Nightly)
+   - ❌ Don't create artificial tiers - organize by tool type and infrastructure overhead
 
 ---
 
@@ -661,34 +890,66 @@ TEA test-design workflow supports TWO modes, detected automatically:
    | Login flow  | E2E        | P0       | R-001     | 3          | QA    |
    ```
 
-3. **Document Execution Order**
+3. **Document Execution Strategy** (Simple, Not Redundant)
+
+   **IMPORTANT: Keep execution strategy simple and avoid redundancy**
 
    ```markdown
-   ### Smoke Tests (<5 min)
+   ## Execution Strategy
 
-   - Login successful
-   - Dashboard loads
+   **Default: Run all functional tests in PRs (~10-15 min)**
+   - All Playwright tests (parallelized across 4 shards)
+   - Includes E2E, API, integration, unit tests
+   - Total: ~{N} tests
 
-   ### P0 Tests (<10 min)
+   **Nightly: Performance & Infrastructure tests**
+   - k6 load/stress/spike tests (~30-60 min)
+   - Reason: Expensive infrastructure, long-running
 
-   - [Full P0 list]
-
-   ### P1 Tests (<30 min)
-
-   - [Full P1 list]
+   **Weekly: Chaos & Disaster Recovery**
+   - Endurance tests (4+ hours)
+   - Multi-region failover (requires AWS FIS)
+   - Backup restore validation
+   - Reason: Special infrastructure, very long-running
    ```
 
+   **DO NOT:**
+   - ❌ Create redundant smoke/P0/P1/P2/P3 tier structure
+   - ❌ List all tests again in execution order (already in coverage plan)
+   - ❌ Split tests by priority unless there's infrastructure overhead
+
 4. **Include Resource Estimates**
 
+   **IMPORTANT: Use intervals/ranges, not exact numbers**
+
+   Provide rough estimates with intervals to avoid false precision:
+
    ```markdown
    ### Test Effort Estimates
 
-   - P0 scenarios: 15 tests × 2 hours = 30 hours
-   - P1 scenarios: 25 tests × 1 hour = 25 hours
-   - P2 scenarios: 40 tests × 0.5 hour = 20 hours
-   - **Total:** 75 hours (~10 days)
+   - P0 scenarios: 15 tests (~1.5-2.5 hours each) = **~25-40 hours**
+   - P1 scenarios: 25 tests (~0.75-1.5 hours each) = **~20-35 hours**
+   - P2 scenarios: 40 tests (~0.25-0.75 hours each) = **~10-30 hours**
+   - **Total:** **~55-105 hours** (~1.5-3 weeks with 1 QA engineer)
    ```
 
+   **Why intervals:**
+   - Avoids false precision (estimates are never exact)
+   - Provides flexibility for complexity variations
+   - Accounts for unknowns and dependencies
+   - More realistic and less prescriptive
+
+   **Guidelines:**
+   - P0 tests: 1.5-2.5 hours each (complex setup, security, performance)
+   - P1 tests: 0.75-1.5 hours each (standard integration, API tests)
+   - P2 tests: 0.25-0.75 hours each (edge cases, simple validation)
+   - P3 tests: 0.1-0.5 hours each (exploratory, documentation)
+
+   **Express totals as:**
+   - Hour ranges: "~55-105 hours"
+   - Week ranges: "~1.5-3 weeks"
+   - Avoid: Exact numbers like "75 hours" or "11 days"
+
 5. **Add Gate Criteria**
 
    ```markdown
diff --git a/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md b/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md
index 3cf8be46..571f6f20 100644
--- a/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md
+++ b/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md
@@ -108,54 +108,51 @@
 
 ### Testability Concerns and Architectural Gaps
 
-**IMPORTANT**: {If system has constraints, explain them. If standard CI/CD achievable, state that.}
+**🚨 ACTIONABLE CONCERNS - Architecture Team Must Address**
 
-#### Blockers to Fast Feedback
+{If system has critical testability concerns, list them here. If architecture supports testing well, state "No critical testability concerns identified" and skip to Testability Assessment Summary}
 
-| Blocker | Impact | Current Mitigation | Ideal Solution |
-|---------|--------|-------------------|----------------|
-| **{Blocker name}** | {Impact description} | {How we're working around it} | {What architecture should provide} |
+#### 1. Blockers to Fast Feedback (WHAT WE NEED FROM ARCHITECTURE)
 
-#### Why This Matters
+| Concern | Impact | What Architecture Must Provide | Owner | Timeline |
+|---------|--------|--------------------------------|-------|----------|
+| **{Concern name}** | {Impact on testing} | {Specific architectural change needed} | {Team} | {Sprint} |
 
-**Standard CI/CD expectations:**
-- Full test suite on every commit (~5-15 min feedback)
-- Parallel test execution (isolated test data per worker)
-- Ephemeral test environments (spin up → test → tear down)
-- Fast feedback loop (devs stay in flow state)
+**Example:**
+- **No API for test data seeding** → Cannot parallelize tests → Provide POST /test/seed endpoint (Backend, Sprint 0)
 
-**Current reality for {Feature}:**
-- {Actual situation - what's different from standard}
+#### 2. Architectural Improvements Needed (WHAT SHOULD BE CHANGED)
 
-#### Tiered Testing Strategy
-
-{If forced by architecture, explain. If standard approach works, state that.}
-
-| Tier | When | Duration | Coverage | Why Not Full Suite? |
-|------|------|----------|----------|---------------------|
-| **Smoke** | Every commit | <5 min | {N} tests | Fast feedback, catch build-breaking changes |
-| **P0** | Every commit | ~{X} min | ~{N} tests | Critical paths, security-critical flows |
-| **P1** | PR to main | ~{X} min | ~{N} tests | Important features, algorithm accuracy |
-| **P2/P3** | Nightly | ~{X} min | ~{N} tests | Edge cases, performance, NFR |
-
-**Note**: {Any timing assumptions or constraints}
-
-#### Architectural Improvements Needed
-
-{If system has technical debt affecting testing, list improvements. If architecture supports testing well, acknowledge that.}
+{List specific improvements that would make the system more testable}
 
 1. **{Improvement name}**
-   - {What to change}
-   - **Impact**: {How it improves testing}
+   - **Current problem**: {What's wrong}
+   - **Required change**: {What architecture must do}
+   - **Impact if not fixed**: {Consequences}
+   - **Owner**: {Team}
+   - **Timeline**: {Sprint}
 
-#### Acceptance of Trade-offs
+---
 
-For {Feature} Phase 1, the team accepts:
-- **{Trade-off 1}** ({Reasoning})
-- **{Trade-off 2}** ({Reasoning})
-- ⚠️ **{Known limitation}** ({Why acceptable for now})
+### Testability Assessment Summary
 
-This is {**technical debt** OR **acceptable for Phase 1**} that should be {revisited post-GA OR maintained as-is}.
+**📊 CURRENT STATE - FYI**
+
+{Only include this section if there are passing items worth mentioning. Otherwise omit.}
+
+#### What Works Well
+
+- ✅ {Passing item 1} (e.g., "API-first design supports parallel test execution")
+- ✅ {Passing item 2} (e.g., "Feature flags enable test isolation")
+- ✅ {Passing item 3}
+
+#### Accepted Trade-offs (No Action Required)
+
+For {Feature} Phase 1, the following trade-offs are acceptable:
+- **{Trade-off 1}** - {Why acceptable for now}
+- **{Trade-off 2}** - {Why acceptable for now}
+
+{This is technical debt OR acceptable for Phase 1} that {should be revisited post-GA OR maintained as-is}
 
 ---
 
diff --git a/src/bmm/workflows/testarch/test-design/test-design-qa-template.md b/src/bmm/workflows/testarch/test-design/test-design-qa-template.md
index a055736b..037856b7 100644
--- a/src/bmm/workflows/testarch/test-design/test-design-qa-template.md
+++ b/src/bmm/workflows/testarch/test-design/test-design-qa-template.md
@@ -1,314 +1,286 @@
 # Test Design for QA: {Feature Name}
 
-**Purpose:** Test execution recipe for QA team. Defines test scenarios, coverage plan, tooling, and Sprint 0 setup requirements. Use this as your implementation guide after architectural blockers are resolved.
+**Purpose:** Test execution recipe for QA team. Defines what to test, how to test it, and what QA needs from other teams.
 
 **Date:** {date}
 **Author:** {author}
-**Status:** Draft / Ready for Implementation
+**Status:** Draft
 **Project:** {project_name}
-**PRD Reference:** {prd_link}
-**ADR Reference:** {adr_link}
+
+**Related:** See Architecture doc (test-design-architecture.md) for testability concerns and architectural blockers.
 
 ---
 
-## Quick Reference for QA
+## Executive Summary
 
-**Before You Start:**
-- [ ] Review Architecture doc (test-design-architecture.md) - understand blockers and risks
-- [ ] Verify Sprint 0 blockers resolved (see Sprint 0 section below)
-- [ ] Confirm test infrastructure ready (factories, fixtures, environments)
+**Scope:** {Brief description of testing scope}
 
-**Test Execution Order:**
-1. **Smoke tests** (<5 min) - Fast feedback on critical paths
-2. **P0 tests** (~{X} min) - Critical paths, security-critical flows
-3. **P1 tests** (~{X} min) - Important features, algorithm accuracy
-4. **P2/P3 tests** (~{X} min) - Edge cases, performance, NFR
+**Risk Summary:**
+- Total Risks: {N} ({X} high-priority score ≥6, {Y} medium, {Z} low)
+- Critical Categories: {Categories with most high-priority risks}
 
-**Need Help?**
-- Blockers: See Architecture doc "Quick Guide" for mitigation plans
-- Test scenarios: See "Test Coverage Plan" section below
-- Sprint 0 setup: See "Sprint 0 Setup Requirements" section
+**Coverage Summary:**
+- P0 tests: ~{N} (critical paths, security)
+- P1 tests: ~{N} (important features, integration)
+- P2 tests: ~{N} (edge cases, regression)
+- P3 tests: ~{N} (exploratory, benchmarks)
+- **Total**: ~{N} tests (~{X}-{Y} weeks with 1 QA)
 
 ---
 
-## System Architecture Summary
+## Dependencies & Test Blockers
 
-**Data Pipeline:**
-{Brief description of system flow}
+**CRITICAL:** QA cannot proceed without these items from other teams.
 
-**Key Services:**
-- **{Service 1}**: {Purpose and key responsibilities}
-- **{Service 2}**: {Purpose and key responsibilities}
-- **{Service 3}**: {Purpose and key responsibilities}
+### Backend/Architecture Dependencies (Sprint 0)
 
-**Data Stores:**
-- **{Database 1}**: {What it stores}
-- **{Database 2}**: {What it stores}
+**Source:** See Architecture doc "Quick Guide" for detailed mitigation plans
 
-**Expected Scale** (from ADR):
-- {Key metrics: RPS, volume, users, etc.}
+1. **{Dependency 1}** - {Team} - {Timeline}
+   - {What QA needs}
+   - {Why it blocks testing}
 
----
+2. **{Dependency 2}** - {Team} - {Timeline}
+   - {What QA needs}
+   - {Why it blocks testing}
 
-## Test Environment Requirements
+### QA Infrastructure Setup (Sprint 0)
 
-**{Company} Standard:** Shared DB per Environment with Randomization (Shift-Left)
+1. **Test Data Factories** - QA
+   - {Entity} factory with faker-based randomization
+   - Auto-cleanup fixtures for parallel safety
 
-| Environment | Database | Test Data Strategy | Purpose |
-|-------------|----------|-------------------|---------|
-| **Local** | {DB} (shared) | Randomized (faker), auto-cleanup | Local development |
-| **Dev (CI)** | {DB} (shared) | Randomized (faker), auto-cleanup | PR validation |
-| **Staging** | {DB} (shared) | Randomized (faker), auto-cleanup | Pre-production, E2E |
+2. **Test Environments** - QA
+   - Local: {Setup details}
+   - CI/CD: {Setup details}
+   - Staging: {Setup details}
 
-**Key Principles:**
-- **Shared database per environment** (no ephemeral)
-- **Randomization for isolation** (faker-based unique IDs)
-- **Parallel-safe** (concurrent test runs don't conflict)
-- **Self-cleaning** (tests delete their own data)
-- **Shift-left** (test against real DBs early)
-
-**Example:**
+**Example factory pattern:**
 
 ```typescript
-import { faker } from "@faker-js/faker";
+import { test } from '@seontechnologies/playwright-utils/api-request/fixtures';
+import { expect } from '@playwright/test';
+import { faker } from '@faker-js/faker';
 
-test("example with randomized test data @p0", async ({ apiRequest }) => {
+test('example test @p0', async ({ apiRequest }) => {
   const testData = {
     id: `test-${faker.string.uuid()}`,
-    customerId: `test-customer-${faker.string.alphanumeric(8)}`,
-    // ... unique test data
+    email: faker.internet.email(),
   };
 
-  // Seed, test, cleanup
+  const { status } = await apiRequest({
+    method: 'POST',
+    path: '/api/resource',
+    body: testData,
+  });
+
+  expect(status).toBe(201);
 });
 ```
 
 ---
 
-## Testability Assessment
+## Risk Assessment
 
-**Prerequisites from Architecture Doc:**
+**Note:** Full risk details in Architecture doc. This section summarizes risks relevant to QA test planning.
 
-Verify these blockers are resolved before test development:
-- [ ] {Blocker 1} (see Architecture doc Quick Guide → 🚨 BLOCKERS)
-- [ ] {Blocker 2}
-- [ ] {Blocker 3}
+### High-Priority Risks (Score ≥6)
 
-**If Prerequisites Not Met:** Coordinate with Architecture team (see Architecture doc for mitigation plans and owner assignments)
+| Risk ID | Category | Description | Score | QA Test Coverage |
+|---------|----------|-------------|-------|------------------|
+| **{R-ID}** | {CAT} | {Brief description} | **{Score}** | {How QA validates this risk} |
 
----
+### Medium/Low-Priority Risks
 
-## Test Levels Strategy
-
-**System Type:** {API-heavy / UI-heavy / Mixed backend system}
-
-**Recommended Split:**
-- **Unit Tests: {X}%** - {What to unit test}
-- **Integration/API Tests: {X}%** - ⭐ **PRIMARY FOCUS** - {What to integration test}
-- **E2E Tests: {X}%** - {What to E2E test}
-
-**Rationale:** {Why this split makes sense for this system}
-
-**Test Count Summary:**
-- P0: ~{N} tests - Critical paths, run on every commit
-- P1: ~{N} tests - Important features, run on PR to main
-- P2: ~{N} tests - Edge cases, run nightly/weekly
-- P3: ~{N} tests - Exploratory, run on-demand
-- **Total: ~{N} tests** (~{X} weeks for 1 QA, ~{Y} weeks for 2 QAs)
+| Risk ID | Category | Description | Score | QA Test Coverage |
+|---------|----------|-------------|-------|------------------|
+| {R-ID} | {CAT} | {Brief description} | {Score} | {How QA validates this risk} |
 
 ---
 
 ## Test Coverage Plan
 
-**Repository Note:** {Where tests live - backend repo, admin panel repo, etc. - and how CI pipelines are organized}
+**IMPORTANT:** P0/P1/P2/P3 = **priority and risk level** (what to focus on if time-constrained), NOT execution timing. See "Execution Strategy" for when tests run.
 
-### P0 (Critical) - Run on every commit (~{X} min)
+### P0 (Critical)
 
-**Execution:** CI/CD on every commit, parallel workers, smoke tests first (<5 min)
+**Criteria:** Blocks core functionality + High risk (≥6) + No workaround + Affects majority of users
 
-**Purpose:** Critical path validation - catch build-breaking changes and security violations immediately
+| Test ID | Requirement | Test Level | Risk Link | Notes |
+|---------|-------------|------------|-----------|-------|
+| **P0-001** | {Requirement} | {Level} | {R-ID} | {Notes} |
+| **P0-002** | {Requirement} | {Level} | {R-ID} | {Notes} |
 
-**Criteria:** Blocks core functionality OR High risk (≥6) OR No workaround
-
-**Key Smoke Tests** (subset of P0, run first for fast feedback):
-- {Smoke test 1} - {Duration}
-- {Smoke test 2} - {Duration}
-- {Smoke test 3} - {Duration}
-
-| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
-|-------------|------------|-----------|------------|-------|-------|
-| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
-| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
-
-**Total P0:** ~{N} tests (~{X} weeks)
-
-#### P0 Test Scenarios (Detailed)
-
-**1. {Test Category} ({N} tests) - {CRITICALITY if applicable}**
-
-- [ ] {Scenario 1 with checkbox}
-- [ ] {Scenario 2}
-- [ ] {Scenario 3}
-
-**2. {Test Category 2} ({N} tests)**
-
-- [ ] {Scenario 1}
-- [ ] {Scenario 2}
-
-{Continue for all P0 categories}
+**Total P0:** ~{N} tests
 
 ---
 
-### P1 (High) - Run on PR to main (~{X} min additional)
+### P1 (High)
 
-**Execution:** CI/CD on pull requests to main branch, runs after P0 passes, parallel workers
+**Criteria:** Important features + Medium risk (3-4) + Common workflows + Workaround exists but difficult
 
-**Purpose:** Important feature coverage - algorithm accuracy, complex workflows, Admin Panel interactions
+| Test ID | Requirement | Test Level | Risk Link | Notes |
+|---------|-------------|------------|-----------|-------|
+| **P1-001** | {Requirement} | {Level} | {R-ID} | {Notes} |
+| **P1-002** | {Requirement} | {Level} | {R-ID} | {Notes} |
 
-**Criteria:** Important features OR Medium risk (3-4) OR Common workflows
-
-| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
-|-------------|------------|-----------|------------|-------|-------|
-| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
-| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
-
-**Total P1:** ~{N} tests (~{X} weeks)
-
-#### P1 Test Scenarios (Detailed)
-
-**1. {Test Category} ({N} tests)**
-
-- [ ] {Scenario 1}
-- [ ] {Scenario 2}
-
-{Continue for all P1 categories}
+**Total P1:** ~{N} tests
 
 ---
 
-### P2 (Medium) - Run nightly/weekly (~{X} min)
+### P2 (Medium)
 
-**Execution:** Scheduled nightly run (or weekly for P3), full infrastructure, sequential execution acceptable
+**Criteria:** Secondary features + Low risk (1-2) + Edge cases + Regression prevention
 
-**Purpose:** Edge case coverage, error handling, data integrity validation - slow feedback acceptable
+| Test ID | Requirement | Test Level | Risk Link | Notes |
+|---------|-------------|------------|-----------|-------|
+| **P2-001** | {Requirement} | {Level} | {R-ID} | {Notes} |
 
-**Criteria:** Secondary features OR Low risk (1-2) OR Edge cases
-
-| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
-|-------------|------------|-----------|------------|-------|-------|
-| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
-| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
-
-**Total P2:** ~{N} tests (~{X} weeks)
+**Total P2:** ~{N} tests
 
 ---
 
-### P3 (Low) - Run on-demand (exploratory)
+### P3 (Low)
 
-**Execution:** Manual trigger or weekly scheduled run, performance testing
+**Criteria:** Nice-to-have + Exploratory + Performance benchmarks + Documentation validation
 
-**Purpose:** Full regression, performance benchmarks, accessibility validation - no time pressure
+| Test ID | Requirement | Test Level | Notes |
+|---------|-------------|------------|-------|
+| **P3-001** | {Requirement} | {Level} | {Notes} |
 
-**Criteria:** Nice-to-have OR Exploratory OR Performance benchmarks
-
-| Requirement | Test Level | Test Count | Owner | Notes |
-|-------------|------------|------------|-------|-------|
-| {Requirement 1} | {Level} | {N} | QA | {Notes} |
-| {Requirement 2} | {Level} | {N} | QA | {Notes} |
-
-**Total P3:** ~{N} tests (~{X} days)
+**Total P3:** ~{N} tests
 
 ---
 
-### Coverage Matrix (Requirements → Tests)
+## Execution Strategy
 
-| Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
-|-------------|------------|----------|-----------|------------|-------|
-| {Requirement 1} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
-| {Requirement 2} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
+**Philosophy:** Run everything in PRs unless there's significant infrastructure overhead. Playwright with parallelization is extremely fast (100s of tests in ~10-15 min).
+
+**Organized by TOOL TYPE:**
+
+### Every PR: Playwright Tests (~10-15 min)
+
+**All functional tests** (from any priority level):
+- All E2E, API, integration, unit tests using Playwright
+- Parallelized across {N} shards
+- Total: ~{N} Playwright tests (includes P0, P1, P2, P3)
+
+**Why run in PRs:** Fast feedback, no expensive infrastructure
+
+### Nightly: k6 Performance Tests (~30-60 min)
+
+**All performance tests** (from any priority level):
+- Load, stress, spike, endurance tests
+- Total: ~{N} k6 tests (may include P0, P1, P2)
+
+**Why defer to nightly:** Expensive infrastructure (k6 Cloud), long-running (10-40 min per test)
+
+### Weekly: Chaos & Long-Running (~hours)
+
+**Special infrastructure tests** (from any priority level):
+- Multi-region failover (requires AWS Fault Injection Simulator)
+- Disaster recovery (backup restore, 4+ hours)
+- Endurance tests (4+ hours runtime)
+
+**Why defer to weekly:** Very expensive infrastructure, very long-running, infrequent validation sufficient
+
+**Manual tests** (excluded from automation):
+- DevOps validation (deployment, monitoring)
+- Finance validation (cost alerts)
+- Documentation validation
 
 ---
 
-## Sprint 0 Setup Requirements
+## QA Effort Estimate
 
-**IMPORTANT:** These items **BLOCK test development**. Complete in Sprint 0 before QA can write tests.
+**QA test development effort only** (excludes DevOps, Backend, Data Eng, Finance work):
 
-### Architecture/Backend Blockers (from Architecture doc)
+| Priority | Count | Effort Range | Notes |
+|----------|-------|--------------|-------|
+| P0 | ~{N} | ~{X}-{Y} weeks | Complex setup (security, performance, multi-step) |
+| P1 | ~{N} | ~{X}-{Y} weeks | Standard coverage (integration, API tests) |
+| P2 | ~{N} | ~{X}-{Y} days | Edge cases, simple validation |
+| P3 | ~{N} | ~{X}-{Y} days | Exploratory, benchmarks |
+| **Total** | ~{N} | **~{X}-{Y} weeks** | **1 QA engineer, full-time** |
 
-**Source:** See Architecture doc "Quick Guide" for detailed mitigation plans
+**Assumptions:**
+- Includes test design, implementation, debugging, CI integration
+- Excludes ongoing maintenance (~10% effort)
+- Assumes test infrastructure (factories, fixtures) ready
 
-1. **{Blocker 1}** 🚨 **BLOCKER** - {Owner}
-   - {What needs to be provided}
-   - **Details:** Architecture doc {Risk-ID} mitigation plan
-
-2. **{Blocker 2}** 🚨 **BLOCKER** - {Owner}
-   - {What needs to be provided}
-   - **Details:** Architecture doc {Risk-ID} mitigation plan
-
-### QA Test Infrastructure
-
-1. **{Factory/Fixture Name}** - QA
-   - Faker-based generator: `{function_signature}`
-   - Auto-cleanup after tests
-
-2. **{Entity} Fixtures** - QA
-   - Seed scripts for {states/scenarios}
-   - Isolated {id_pattern} per test
-
-### Test Environments
-
-**Local:** {Setup details - Docker, LocalStack, etc.}
-
-**CI/CD:** {Setup details - shared infrastructure, parallel workers, artifacts}
-
-**Staging:** {Setup details - shared multi-tenant, nightly E2E}
-
-**Production:** {Setup details - feature flags, canary transactions}
-
-**Sprint 0 NFR Gates** (MUST complete before integration testing):
-- [ ] {Gate 1}: {Description} (Owner) 🚨
-- [ ] {Gate 2}: {Description} (Owner) 🚨
-- [ ] {Gate 3}: {Description} (Owner) 🚨
-
-### Sprint 1 Items (Not Sprint 0)
-
-- **{Item 1}** ({Owner}): {Description}
-- **{Item 2}** ({Owner}): {Description}
-
-**Sprint 1 NFR Gates** (MUST complete before GA):
-- [ ] {Gate 1}: {Description} (Owner)
-- [ ] {Gate 2}: {Description} (Owner)
+**Dependencies from other teams:**
+- See "Dependencies & Test Blockers" section for what QA needs from Backend, DevOps, Data Eng
 
 ---
 
-## NFR Readiness Summary
+## Appendix A: Code Examples & Tagging
 
-**Based on Architecture Doc Risk Assessment**
+**Playwright Tags for Selective Execution:**
 
-| NFR Category | Status | Evidence Status | Blocker | Next Action |
-|--------------|--------|-----------------|---------|-------------|
-| **Testability & Automation** | {Status} | {Evidence} | {Sprint} | {Action} |
-| **Test Data Strategy** | {Status} | {Evidence} | {Sprint} | {Action} |
-| **Scalability & Availability** | {Status} | {Evidence} | {Sprint} | {Action} |
-| **Disaster Recovery** | {Status} | {Evidence} | {Sprint} | {Action} |
-| **Security** | {Status} | {Evidence} | {Sprint} | {Action} |
-| **Monitorability, Debuggability & Manageability** | {Status} | {Evidence} | {Sprint} | {Action} |
-| **QoS & QoE** | {Status} | {Evidence} | {Sprint} | {Action} |
-| **Deployability** | {Status} | {Evidence} | {Sprint} | {Action} |
+```typescript
+import { test } from '@seontechnologies/playwright-utils/api-request/fixtures';
+import { expect } from '@playwright/test';
 
-**Total:** {N} PASS, {N} CONCERNS across {N} categories
+// P0 critical test
+test('@P0 @API @Security unauthenticated request returns 401', async ({ apiRequest }) => {
+  const { status, body } = await apiRequest({
+    method: 'POST',
+    path: '/api/endpoint',
+    body: { data: 'test' },
+    skipAuth: true,
+  });
+
+  expect(status).toBe(401);
+  expect(body.error).toContain('unauthorized');
+});
+
+// P1 integration test
+test('@P1 @Integration data syncs correctly', async ({ apiRequest }) => {
+  // Seed data
+  await apiRequest({
+    method: 'POST',
+    path: '/api/seed',
+    body: { /* test data */ },
+  });
+
+  // Validate
+  const { status, body } = await apiRequest({
+    method: 'GET',
+    path: '/api/resource',
+  });
+
+  expect(status).toBe(200);
+  expect(body).toHaveProperty('data');
+});
+```
+
+**Run specific tags:**
+
+```bash
+# Run only P0 tests
+npx playwright test --grep @P0
+
+# Run P0 + P1 tests
+npx playwright test --grep "@P0|@P1"
+
+# Run only security tests
+npx playwright test --grep @Security
+
+# Run all Playwright tests in PR (default)
+npx playwright test
+```
 
 ---
 
-**End of QA Document**
+## Appendix B: Knowledge Base References
 
-**Next Steps for QA Team:**
-1. Verify Sprint 0 blockers resolved (coordinate with Architecture team if not)
-2. Set up test infrastructure (factories, fixtures, environments)
-3. Begin test implementation following priority order (P0 → P1 → P2 → P3)
-4. Run smoke tests first for fast feedback
-5. Track progress using test scenario checklists above
+- **Risk Governance**: `risk-governance.md` - Risk scoring methodology
+- **Test Priorities Matrix**: `test-priorities-matrix.md` - P0-P3 criteria
+- **Test Levels Framework**: `test-levels-framework.md` - E2E vs API vs Unit selection
+- **Test Quality**: `test-quality.md` - Definition of Done (no hard waits, <300 lines, <1.5 min)
 
-**Next Steps for Architecture Team:**
-1. Monitor Sprint 0 blocker resolution
-2. Provide support for QA infrastructure setup if needed
-3. Review test results and address any newly discovered testability gaps
+---
+
+**Generated by:** BMad TEA Agent
+**Workflow:** `_bmad/bmm/testarch/test-design`
+**Version:** 4.0 (BMad v6)