diff --git a/docs/explanation/features/tea-overview.md b/docs/explanation/features/tea-overview.md
index 1289af6e..7ff65ab2 100644
--- a/docs/explanation/features/tea-overview.md
+++ b/docs/explanation/features/tea-overview.md
@@ -160,7 +160,7 @@ graph TB
**TEA workflows:** `*framework` and `*ci` run once in Phase 3 after architecture. `*test-design` is **dual-mode**:
-- **System-level (Phase 3):** Run immediately after architecture/ADR drafting to produce `test-design-system.md` (testability review, ADR → test mapping, Architecturally Significant Requirements (ASRs), environment needs). Feeds the implementation-readiness gate.
+- **System-level (Phase 3):** Run immediately after architecture/ADR drafting to produce TWO documents: `test-design-architecture.md` (for Architecture/Dev teams: testability gaps, ASRs, NFR requirements) + `test-design-qa.md` (for QA team: test execution recipe, coverage plan, Sprint 0 setup). Feeds the implementation-readiness gate.
- **Epic-level (Phase 4):** Run per-epic to produce `test-design-epic-N.md` (risk, priorities, coverage plan).
The Quick Flow track skips Phases 1 and 3.
diff --git a/docs/how-to/brownfield/use-tea-for-enterprise.md b/docs/how-to/brownfield/use-tea-for-enterprise.md
index 5285153c..b8397715 100644
--- a/docs/how-to/brownfield/use-tea-for-enterprise.md
+++ b/docs/how-to/brownfield/use-tea-for-enterprise.md
@@ -114,10 +114,9 @@ Focus areas:
- Performance requirements (SLA: P99 <200ms)
- Compliance (HIPAA PHI handling, audit logging)
-Output: test-design-system.md with:
-- Security testing strategy
-- Compliance requirement → test mapping
-- Performance testing plan
+Output: TWO documents (system-level):
+- `test-design-architecture.md`: Security gaps, compliance requirements, performance SLOs for Architecture team
+- `test-design-qa.md`: Security testing strategy, compliance test mapping, performance testing plan for QA team
- Audit logging validation
```
diff --git a/docs/how-to/workflows/run-test-design.md b/docs/how-to/workflows/run-test-design.md
index 2b44fdac..64424b65 100644
--- a/docs/how-to/workflows/run-test-design.md
+++ b/docs/how-to/workflows/run-test-design.md
@@ -55,20 +55,44 @@ For epic-level:
### 5. Review the Output
-TEA generates a comprehensive test design document.
+TEA generates test design document(s) based on mode.
## What You Get
-**System-Level Output (`test-design-system.md`):**
-- Testability review of architecture
-- ADR → test mapping
-- Architecturally Significant Requirements (ASRs)
-- Environment needs
-- Test infrastructure recommendations
+**System-Level Output (TWO Documents):**
-**Epic-Level Output (`test-design-epic-N.md`):**
+TEA produces two focused documents for system-level mode:
+
+1. **`test-design-architecture.md`** (for Architecture/Dev teams)
+ - Purpose: Architectural concerns, testability gaps, NFR requirements
+ - Quick Guide with 🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY
+ - Risk assessment (high/medium/low-priority with scoring)
+ - Testability concerns and architectural gaps
+ - Risk mitigation plans for high-priority risks (≥6)
+ - Assumptions and dependencies
+
+2. **`test-design-qa.md`** (for QA team)
+ - Purpose: Test execution recipe, coverage plan, Sprint 0 setup
+ - Quick Reference for QA (Before You Start, Execution Order, Need Help)
+ - System architecture summary
+ - Test environment requirements (moved up - early in doc)
+ - Testability assessment (prerequisites checklist)
+ - Test levels strategy (unit/integration/E2E split)
+ - Test coverage plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
+ - Sprint 0 setup requirements (blockers, infrastructure, environments)
+ - NFR readiness summary
+
+**Why Two Documents?**
+- **Architecture teams** can scan blockers in <5 min (Quick Guide format)
+- **QA teams** have actionable test recipes (step-by-step with checklists)
+- **No redundancy** between documents (cross-references instead of duplication)
+- **Clear separation** of concerns (what to deliver vs how to test)
+
+**Epic-Level Output (ONE Document):**
+
+**`test-design-epic-N.md`** (combined risk assessment + test plan)
- Risk assessment for the epic
-- Test priorities
+- Test priorities (P0-P3)
- Coverage plan
- Regression hotspots (for brownfield)
- Integration risks
@@ -82,12 +106,25 @@ TEA generates a comprehensive test design document.
| **Brownfield** | System-level + existing test baseline | Regression hotspots, integration risks |
| **Enterprise** | Compliance-aware testability | Security/performance/compliance focus |
+## Examples
+
+**System-Level (Two Documents):**
+- `cluster-search/cluster-search-test-design-architecture.md` - Architecture doc with Quick Guide
+- `cluster-search/cluster-search-test-design-qa.md` - QA doc with test scenarios
+
+**Key Pattern:**
+- Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)"
+- QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)"
+- No duplication, just cross-references
+
## Tips
- **Run system-level right after architecture** — Early testability review
- **Run epic-level at the start of each epic** — Targeted test planning
- **Update if ADRs change** — Keep test design aligned
- **Use output to guide other workflows** — Feeds into `*atdd` and `*automate`
+- **Architecture teams review Architecture doc** — Focus on blockers and mitigation plans
+- **QA teams use QA doc as implementation guide** — Follow test scenarios and Sprint 0 checklist
## Next Steps
diff --git a/docs/reference/tea/commands.md b/docs/reference/tea/commands.md
index ed1ad8c2..6180bf13 100644
--- a/docs/reference/tea/commands.md
+++ b/docs/reference/tea/commands.md
@@ -72,17 +72,39 @@ Quick reference for all 8 TEA (Test Architect) workflows. For detailed step-by-s
**Frequency:** Once (system), per epic (epic-level)
**Modes:**
-- **System-level:** Architecture testability review
-- **Epic-level:** Per-epic risk assessment
+- **System-level:** Architecture testability review (TWO documents)
+- **Epic-level:** Per-epic risk assessment (ONE document)
**Key Inputs:**
-- Architecture/epic, requirements, ADRs
+- System-level: Architecture, PRD, ADRs
+- Epic-level: Epic, stories, acceptance criteria
**Key Outputs:**
-- `test-design-system.md` or `test-design-epic-N.md`
-- Risk assessment (probability × impact scores)
-- Test priorities (P0-P3)
-- Coverage strategy
+
+**System-Level (TWO Documents):**
+- `test-design-architecture.md` - For Architecture/Dev teams
+ - Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY)
+ - Risk assessment with scoring
+ - Testability concerns and gaps
+ - Mitigation plans
+- `test-design-qa.md` - For QA team
+ - Test execution recipe
+ - Coverage plan (P0/P1/P2/P3 with checkboxes)
+ - Sprint 0 setup requirements
+ - NFR readiness summary
+
+**Epic-Level (ONE Document):**
+- `test-design-epic-N.md`
+ - Risk assessment (probability × impact scores)
+ - Test priorities (P0-P3)
+ - Coverage strategy
+ - Mitigation plans
+
+**Why Two Documents for System-Level?**
+- Architecture teams scan blockers in <5 min
+- QA teams have actionable test recipes
+- No redundancy (cross-references instead)
+- Clear separation (what to deliver vs how to test)
**MCP Enhancement:** Exploratory mode (live browser UI discovery)
diff --git a/docs/reference/tea/configuration.md b/docs/reference/tea/configuration.md
index 69913103..ae24d324 100644
--- a/docs/reference/tea/configuration.md
+++ b/docs/reference/tea/configuration.md
@@ -197,7 +197,7 @@ output_folder: _bmad-output
```
**TEA Output Files:**
-- `test-design-system.md` (from *test-design system-level)
+- `test-design-architecture.md` + `test-design-qa.md` (from *test-design system-level - TWO documents)
- `test-design-epic-N.md` (from *test-design epic-level)
- `test-review.md` (from *test-review)
- `traceability-matrix.md` (from *trace Phase 1)
diff --git a/docs/tutorials/getting-started/tea-lite-quickstart.md b/docs/tutorials/getting-started/tea-lite-quickstart.md
index 839c06ac..34c9dda4 100644
--- a/docs/tutorials/getting-started/tea-lite-quickstart.md
+++ b/docs/tutorials/getting-started/tea-lite-quickstart.md
@@ -15,7 +15,7 @@ By the end of this 30-minute tutorial, you'll have:
:::note[Prerequisites]
- Node.js installed (v20 or later)
- 30 minutes of focused time
-- We'll use TodoMVC () as our demo app
+- We'll use TodoMVC () as our demo app
:::
:::tip[Quick Path]
diff --git a/src/bmm/testarch/knowledge/adr-quality-readiness-checklist.md b/src/bmm/testarch/knowledge/adr-quality-readiness-checklist.md
new file mode 100644
index 00000000..0e8c1899
--- /dev/null
+++ b/src/bmm/testarch/knowledge/adr-quality-readiness-checklist.md
@@ -0,0 +1,350 @@
+# ADR Quality Readiness Checklist
+
+**Purpose:** Standardized 8-category, 29-criteria framework for evaluating system testability and NFR compliance during architecture review (Phase 3) and NFR assessment.
+
+**When to Use:**
+- System-level test design (Phase 3): Identify testability gaps in architecture
+- NFR assessment workflow: Structured evaluation with evidence
+- Gate decisions: Quantifiable criteria (X/29 met = PASS/CONCERNS/FAIL)
+
+**How to Use:**
+1. For each criterion, assess status: ✅ Covered / ⚠️ Gap / ⬜ Not Assessed
+2. Document gap description if ⚠️
+3. Describe risk if criterion unmet
+4. Map to test scenarios (what tests validate this criterion)
+
+---
+
+## 1. Testability & Automation
+
+**Question:** Can we verify this effectively without manual toil?
+
+| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
+| --- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
+| 1.1 | **Isolation:** Can the service be tested with all downstream dependencies (DBs, APIs, Queues) mocked or stubbed? | Flaky tests; inability to test in isolation | P1: Service runs with mocked DB, P1: Service runs with mocked API, P2: Integration tests with real deps |
+| 1.2 | **Headless Interaction:** Is 100% of the business logic accessible via API (REST/gRPC) to bypass the UI for testing? | Slow, brittle UI-based automation | P0: All core logic callable via API, P1: No UI dependency for critical paths |
+| 1.3 | **State Control:** Do we have "Seeding APIs" or scripts to inject specific data states (e.g., "User with expired subscription") instantly? | Long setup times; inability to test edge cases | P0: Seed baseline data, P0: Inject edge case data states, P1: Cleanup after tests |
+| 1.4 | **Sample Requests:** Are there valid and invalid cURL/JSON sample requests provided in the design doc for QA to build upon? | Ambiguity on how to consume the service | P1: Valid request succeeds, P1: Invalid request fails with clear error |
+
+**Common Gaps:**
+- No mock endpoints for external services (Athena, Milvus, third-party APIs)
+- Business logic tightly coupled to UI (requires E2E tests for everything)
+- No seeding APIs (manual database setup required)
+- ADR has architecture diagrams but no sample API requests
+
+**Mitigation Examples:**
+- 1.1 (Isolation): Provide mock endpoints, dependency injection, interface abstractions
+- 1.2 (Headless): Expose all business logic via REST/GraphQL APIs
+- 1.3 (State Control): Implement `/api/test-data` seeding endpoints (dev/staging only)
+- 1.4 (Sample Requests): Add "Example API Calls" section to ADR with cURL commands
+
+---
+
+## 2. Test Data Strategy
+
+**Question:** How do we fuel our tests safely?
+
+| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
+| --- | ------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------- | ---------------------------------------------------------------------------------------------- |
+| 2.1 | **Segregation:** Does the design support multi-tenancy or specific headers (e.g., x-test-user) to keep test data out of prod metrics? | Skewed business analytics; data pollution | P0: Multi-tenant isolation (customer A ≠ customer B), P1: Test data excluded from prod metrics |
+| 2.2 | **Generation:** Can we use synthetic data, or do we rely on scrubbing production data (GDPR/PII risk)? | Privacy violations; dependency on stale data | P0: Faker-based synthetic data, P1: No production data in tests |
+| 2.3 | **Teardown:** Is there a mechanism to "reset" the environment or clean up data after destructive tests? | Environment rot; subsequent test failures | P0: Automated cleanup after tests, P2: Environment reset script |
+
+**Common Gaps:**
+- No `customer_id` scoping in queries (cross-tenant data leakage risk)
+- Reliance on production data dumps (GDPR/PII violations)
+- No cleanup mechanism (tests leave data behind, polluting environment)
+
+**Mitigation Examples:**
+- 2.1 (Segregation): Enforce `customer_id` in all queries, add test-specific headers
+- 2.2 (Generation): Use Faker library, create synthetic data generators, prohibit prod dumps
+- 2.3 (Teardown): Auto-cleanup hooks in test framework, isolated test customer IDs
+
+---
+
+## 3. Scalability & Availability
+
+**Question:** Can it grow, and will it stay up?
+
+| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
+| --- | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
+| 3.1 | **Statelessness:** Is the service stateless? If not, how is session state replicated across instances? | Inability to auto-scale horizontally | P1: Service restart mid-request → no data loss, P2: Horizontal scaling under load |
+| 3.2 | **Bottlenecks:** Have we identified the weakest link (e.g., database connections, API rate limits) under load? | System crash during peak traffic | P2: Load test identifies bottleneck, P2: Connection pool exhaustion handled |
+| 3.3 | **SLA Definitions:** What is the target Availability (e.g., 99.9%) and does the architecture support redundancy to meet it? | Breach of contract; customer churn | P1: Availability target defined, P2: Redundancy validated (multi-region/zone) |
+| 3.4 | **Circuit Breakers:** If a dependency fails, does this service fail fast or hang? | Cascading failures taking down the whole platform | P1: Circuit breaker opens on 5 failures, P1: Auto-reset after recovery, P2: Timeout prevents hanging |
+
+**Common Gaps:**
+- Stateful session management (can't scale horizontally)
+- No load testing, bottlenecks unknown
+- SLA undefined or unrealistic (99.99% without redundancy)
+- No circuit breakers (cascading failures)
+
+**Mitigation Examples:**
+- 3.1 (Statelessness): Externalize session to Redis/JWT, design for horizontal scaling
+- 3.2 (Bottlenecks): Load test with k6, monitor connection pools, identify weak links
+- 3.3 (SLA): Define realistic SLA (99.9% = 43 min/month downtime), add redundancy
+- 3.4 (Circuit Breakers): Implement circuit breakers (Hystrix pattern), fail fast on errors
+
+---
+
+## 4. Disaster Recovery (DR)
+
+**Question:** What happens when the worst-case scenario occurs?
+
+| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
+| --- | -------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- | ----------------------------------------------------------------------- |
+| 4.1 | **RTO/RPO:** What is the Recovery Time Objective (how long to restore) and Recovery Point Objective (max data loss)? | Extended outages; data loss liability | P2: RTO defined and tested, P2: RPO validated (backup frequency) |
+| 4.2 | **Failover:** Is region/zone failover automated or manual? Has it been practiced? | "Heroics" required during outages; human error | P2: Automated failover works, P2: Manual failover documented and tested |
+| 4.3 | **Backups:** Are backups immutable and tested for restoration integrity? | Ransomware vulnerability; corrupted backups | P2: Backup restore succeeds, P2: Backup immutability validated |
+
+**Common Gaps:**
+- RTO/RPO undefined (no recovery plan)
+- Failover never tested (manual process, prone to errors)
+- Backups exist but restoration never validated (untested backups = no backups)
+
+**Mitigation Examples:**
+- 4.1 (RTO/RPO): Define RTO (e.g., 4 hours) and RPO (e.g., 1 hour), document recovery procedures
+- 4.2 (Failover): Automate multi-region failover, practice failover drills quarterly
+- 4.3 (Backups): Implement immutable backups (S3 versioning), test restore monthly
+
+---
+
+## 5. Security
+
+**Question:** Is the design safe by default?
+
+| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
+| --- | ---------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
+| 5.1 | **AuthN/AuthZ:** Does it implement standard protocols (OAuth2/OIDC)? Are permissions granular (Least Privilege)? | Unauthorized access; data leaks | P0: OAuth flow works, P0: Expired token rejected, P0: Insufficient permissions return 403, P1: Scope enforcement |
+| 5.2 | **Encryption:** Is data encrypted at rest (DB) and in transit (TLS)? | Compliance violations; data theft | P1: Milvus data-at-rest encrypted, P1: TLS 1.2+ enforced, P2: Certificate rotation works |
+| 5.3 | **Secrets:** Are API keys/passwords stored in a Vault (not in code or config files)? | Credentials leaked in git history | P1: No hardcoded secrets in code, P1: Secrets loaded from AWS Secrets Manager |
+| 5.4 | **Input Validation:** Are inputs sanitized against Injection attacks (SQLi, XSS)? | System compromise via malicious payloads | P1: SQL injection sanitized, P1: XSS escaped, P2: Command injection prevented |
+
+**Common Gaps:**
+- Weak authentication (no OAuth, hardcoded API keys)
+- No encryption at rest (plaintext in database)
+- Secrets in git (API keys, passwords in config files)
+- No input validation (vulnerable to SQLi, XSS, command injection)
+
+**Mitigation Examples:**
+- 5.1 (AuthN/AuthZ): Implement OAuth 2.1/OIDC, enforce least privilege, validate scopes
+- 5.2 (Encryption): Enable TDE (Transparent Data Encryption), enforce TLS 1.2+
+- 5.3 (Secrets): Migrate to AWS Secrets Manager/Vault, scan git history for leaks
+- 5.4 (Input Validation): Sanitize all inputs, use parameterized queries, escape outputs
+
+---
+
+## 6. Monitorability, Debuggability & Manageability
+
+**Question:** Can we operate and fix this in production?
+
+| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
+| --- | ---------------------------------------------------------------------------------------------------- | -------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
+| 6.1 | **Tracing:** Does the service propagate W3C Trace Context / Correlation IDs for distributed tracing? | Impossible to debug errors across microservices | P2: W3C Trace Context propagated (EventBridge → Lambda → Service), P2: Correlation ID in all logs |
+| 6.2 | **Logs:** Can log levels (INFO vs DEBUG) be toggled dynamically without a redeploy? | Inability to diagnose issues in real-time | P2: Log level toggle works without redeploy, P2: Logs structured (JSON format) |
+| 6.3 | **Metrics:** Does it expose RED metrics (Rate, Errors, Duration) for Prometheus/Datadog? | Flying blind regarding system health | P2: /metrics endpoint exposes RED metrics, P2: Prometheus/Datadog scrapes successfully |
+| 6.4 | **Config:** Is configuration externalized? Can we change behavior without a code build? | Rigid system; full deploys needed for minor tweaks | P2: Config change without code build, P2: Feature flags toggle behavior |
+
+**Common Gaps:**
+- No distributed tracing (can't debug across microservices)
+- Static log levels (requires redeploy to enable DEBUG)
+- No metrics endpoint (blind to system health)
+- Configuration hardcoded (requires full deploy for minor changes)
+
+**Mitigation Examples:**
+- 6.1 (Tracing): Implement W3C Trace Context, add correlation IDs to all logs
+- 6.2 (Logs): Use dynamic log levels (environment variable), structured logging (JSON)
+- 6.3 (Metrics): Expose /metrics endpoint, track RED metrics (Rate, Errors, Duration)
+- 6.4 (Config): Externalize config (AWS SSM/AppConfig), use feature flags (LaunchDarkly)
+
+---
+
+## 7. QoS (Quality of Service) & QoE (Quality of Experience)
+
+**Question:** How does it perform, and how does it feel?
+
+| # | Criterion | Risk if Unmet | Typical Test Scenarios (P0-P2) |
+| --- | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------- |
+| 7.1 | **Latency (QoS):** What are the P95 and P99 latency targets? | Slow API responses affecting throughput | P3: P95 latency You don't need to follow full BMad methodology to use TEA test-design.
+> Just provide PRD + ADR for system-level, or Epic for epic-level.
+> TEA will auto-detect and produce appropriate documents.
+
+**Halt Condition:** If mode cannot be determined AND user intent unclear AND required files missing, HALT and notify user:
+- "Please provide either: (A) PRD + ADR for system-level test design, OR (B) Epic + Stories for epic-level test design"
---
@@ -70,7 +103,7 @@ The workflow auto-detects which mode to use based on project phase.
3. **Load Knowledge Base Fragments (System-Level)**
**Critical:** Consult `{project-root}/_bmad/bmm/testarch/tea-index.csv` to load:
- - `nfr-criteria.md` - NFR validation approach (security, performance, reliability, maintainability)
+ - `adr-quality-readiness-checklist.md` - 8-category 29-criteria NFR framework (testability, security, scalability, DR, QoS, deployability, etc.)
- `test-levels-framework.md` - Test levels strategy guidance
- `risk-governance.md` - Testability risk identification
- `test-quality.md` - Quality standards and Definition of Done
@@ -91,7 +124,7 @@ The workflow auto-detects which mode to use based on project phase.
2. **Load Architecture Context**
- Read architecture.md for system design
- Read tech-spec for implementation details
- - Read test-design-system.md (if exists from Phase 3)
+ - Read test-design-architecture.md and test-design-qa.md (if exist from Phase 3 system-level test design)
- Identify technical constraints and dependencies
- Note integration points and external systems
@@ -173,50 +206,128 @@ The workflow auto-detects which mode to use based on project phase.
**Critical:** If testability concerns are blockers (e.g., "Architecture makes performance testing impossible"), document as CONCERNS or FAIL recommendation for gate check.
-6. **Output System-Level Test Design**
+6. **Output System-Level Test Design (TWO Documents)**
- Write to `{output_folder}/test-design-system.md` containing:
+ **IMPORTANT:** System-level mode produces TWO documents instead of one:
+
+ **Document 1: test-design-architecture.md** (for Architecture/Dev teams)
+ - Purpose: Architectural concerns, testability gaps, NFR requirements
+ - Audience: Architects, Backend Devs, Frontend Devs, DevOps, Security Engineers
+ - Focus: What architecture must deliver for testability
+ - Template: `test-design-architecture-template.md`
+
+ **Document 2: test-design-qa.md** (for QA team)
+ - Purpose: Test execution recipe, coverage plan, Sprint 0 setup
+ - Audience: QA Engineers, Test Automation Engineers, QA Leads
+ - Focus: How QA will execute tests
+ - Template: `test-design-qa-template.md`
+
+ **Standard Structures (REQUIRED):**
+
+ **test-design-architecture.md sections (in this order):**
+ 1. Executive Summary (scope, business context, architecture, risk summary)
+ 2. Quick Guide (🚨 BLOCKERS / ⚠️ HIGH PRIORITY / 📋 INFO ONLY)
+ 3. Risk Assessment (high/medium/low-priority risks with scoring)
+ 4. Testability Concerns and Architectural Gaps (if system has constraints)
+ 5. Risk Mitigation Plans (detailed for high-priority risks ≥6)
+ 6. Assumptions and Dependencies
+
+ **test-design-qa.md sections (in this order):**
+ 1. Quick Reference for QA (Before You Start, Execution Order, Need Help)
+ 2. System Architecture Summary (brief overview)
+ 3. Test Environment Requirements (MOVE UP - section 3, NOT buried at end)
+ 4. Testability Assessment (lightweight prerequisites checklist)
+ 5. Test Levels Strategy (unit/integration/E2E split with rationale)
+ 6. Test Coverage Plan (P0/P1/P2/P3 with detailed scenarios + checkboxes)
+ 7. Sprint 0 Setup Requirements (blockers, infrastructure, environments)
+ 8. NFR Readiness Summary (reference to Architecture doc)
+
+ **Content Guidelines:**
+
+ **Architecture doc (DO):**
+ - ✅ Risk scoring visible (Probability × Impact = Score)
+ - ✅ Clear ownership (each blocker/ASR has owner + timeline)
+ - ✅ Testability requirements (what architecture must support)
+ - ✅ Mitigation plans (for each high-risk item ≥6)
+ - ✅ Short code examples (5-10 lines max showing what to support)
+
+ **Architecture doc (DON'T):**
+ - ❌ NO long test code examples (belongs in QA doc)
+ - ❌ NO test scenario checklists (belongs in QA doc)
+ - ❌ NO implementation details (how QA will test)
+
+ **QA doc (DO):**
+ - ✅ Test scenario recipes (clear P0/P1/P2/P3 with checkboxes)
+ - ✅ Environment setup (Sprint 0 checklist with blockers)
+ - ✅ Tool setup (factories, fixtures, frameworks)
+ - ✅ Cross-references to Architecture doc (not duplication)
+
+ **QA doc (DON'T):**
+ - ❌ NO architectural theory (just reference Architecture doc)
+ - ❌ NO ASR explanations (link to Architecture doc instead)
+ - ❌ NO duplicate risk assessments (reference Architecture doc)
+
+ **Anti-Patterns to Avoid (Cross-Document Redundancy):**
+
+ ❌ **DON'T duplicate OAuth requirements:**
+ - Architecture doc: Explain OAuth 2.1 flow in detail
+ - QA doc: Re-explain why OAuth 2.1 is required
+
+ ✅ **DO cross-reference instead:**
+ - Architecture doc: "ASR-1: OAuth 2.1 required (see QA doc for 12 test scenarios)"
+ - QA doc: "OAuth tests: 12 P0 scenarios (see Architecture doc R-001 for risk details)"
+
+ **Markdown Cross-Reference Syntax Examples:**
```markdown
- # System-Level Test Design
+ # In test-design-architecture.md
+
+ ### 🚨 R-001: Multi-Tenant Isolation (Score: 9)
+
+ **Test Coverage:** 8 P0 tests (see [QA doc - Multi-Tenant Isolation](test-design-qa.md#multi-tenant-isolation-8-tests---security-critical) for detailed scenarios)
+
+ ---
+
+ # In test-design-qa.md
## Testability Assessment
- - Controllability: [PASS/CONCERNS/FAIL with details]
- - Observability: [PASS/CONCERNS/FAIL with details]
- - Reliability: [PASS/CONCERNS/FAIL with details]
+ **Prerequisites from Architecture Doc:**
+ - [ ] R-001: Multi-tenant isolation validated (see [Architecture doc R-001](test-design-architecture.md#-r-001-multi-tenant-isolation-score-9) for mitigation plan)
+ - [ ] R-002: Test customer provisioned (see [Architecture doc 🚨 BLOCKERS](test-design-architecture.md#-blockers---team-must-decide-cant-proceed-without))
- ## Architecturally Significant Requirements (ASRs)
+ ## Sprint 0 Setup Requirements
- [Risk-scored quality requirements]
-
- ## Test Levels Strategy
-
- - Unit: [X%] - [Rationale]
- - Integration: [Y%] - [Rationale]
- - E2E: [Z%] - [Rationale]
-
- ## NFR Testing Approach
-
- - Security: [Approach with tools]
- - Performance: [Approach with tools]
- - Reliability: [Approach with tools]
- - Maintainability: [Approach with tools]
-
- ## Test Environment Requirements
-
- [Infrastructure needs based on deployment architecture]
-
- ## Testability Concerns (if any)
-
- [Blockers or concerns that should inform solutioning gate check]
-
- ## Recommendations for Sprint 0
-
- [Specific actions for *framework and *ci workflows]
+ **Source:** See [Architecture doc "Quick Guide"](test-design-architecture.md#quick-guide) for detailed mitigation plans
```
-**After System-Level Mode:** Skip to Step 4 (Generate Deliverables) - Steps 2-3 are epic-level only.
+ **Key Points:**
+ - Use relative links: `[Link Text](test-design-qa.md#section-anchor)`
+ - Anchor format: lowercase, hyphens for spaces, remove emojis/special chars
+ - Example anchor: `### 🚨 R-001: Title` → `#-r-001-title`
+
+ ❌ **DON'T put long code examples in Architecture doc:**
+ - Example: 50+ lines of test implementation
+
+ ✅ **DO keep examples SHORT in Architecture doc:**
+ - Example: 5-10 lines max showing what architecture must support
+ - Full implementation goes in QA doc
+
+ ❌ **DON'T repeat same note 10+ times:**
+ - Example: "Pessimistic timing until R-005 fixed" on every P0/P1/P2 section
+
+ ✅ **DO consolidate repeated notes:**
+ - Single timing note at top
+ - Reference briefly throughout: "(pessimistic)"
+
+ **Write Both Documents:**
+ - Use `test-design-architecture-template.md` for Architecture doc
+ - Use `test-design-qa-template.md` for QA doc
+ - Follow standard structures defined above
+ - Cross-reference between docs (no duplication)
+ - Validate against checklist.md (System-Level Mode section)
+
+**After System-Level Mode:** Workflow COMPLETE. System-level outputs (test-design-architecture.md + test-design-qa.md) are written in this step. Steps 2-4 are epic-level only - do NOT execute them in system-level mode.
---
diff --git a/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md b/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md
new file mode 100644
index 00000000..e3fa4917
--- /dev/null
+++ b/src/bmm/workflows/testarch/test-design/test-design-architecture-template.md
@@ -0,0 +1,216 @@
+# Test Design for Architecture: {Feature Name}
+
+**Purpose:** Architectural concerns, testability gaps, and NFR requirements for review by Architecture/Dev teams. Serves as a contract between QA and Engineering on what must be addressed before test development begins.
+
+**Date:** {date}
+**Author:** {author}
+**Status:** Architecture Review Pending
+**Project:** {project_name}
+**PRD Reference:** {prd_link}
+**ADR Reference:** {adr_link}
+
+---
+
+## Executive Summary
+
+**Scope:** {Brief description of feature scope}
+
+**Business Context** (from PRD):
+- **Revenue/Impact:** {Business metrics if applicable}
+- **Problem:** {Problem being solved}
+- **GA Launch:** {Target date or timeline}
+
+**Architecture** (from ADR {adr_number}):
+- **Key Decision 1:** {e.g., OAuth 2.1 authentication}
+- **Key Decision 2:** {e.g., Centralized MCP Server pattern}
+- **Key Decision 3:** {e.g., Stack: TypeScript, SDK v1.x}
+
+**Expected Scale** (from ADR):
+- {RPS, volume, users, etc.}
+
+**Risk Summary:**
+- **Total risks**: {N}
+- **High-priority (≥6)**: {N} risks requiring immediate mitigation
+- **Test effort**: ~{N} tests (~{X} weeks for 1 QA, ~{Y} weeks for 2 QAs)
+
+---
+
+## Quick Guide
+
+### 🚨 BLOCKERS - Team Must Decide (Can't Proceed Without)
+
+**Sprint 0 Critical Path** - These MUST be completed before QA can write integration tests:
+
+1. **{Blocker ID}: {Blocker Title}** - {What architecture must provide} (recommended owner: {Team/Role})
+2. **{Blocker ID}: {Blocker Title}** - {What architecture must provide} (recommended owner: {Team/Role})
+3. **{Blocker ID}: {Blocker Title}** - {What architecture must provide} (recommended owner: {Team/Role})
+
+**What we need from team:** Complete these {N} items in Sprint 0 or test development is blocked.
+
+---
+
+### ⚠️ HIGH PRIORITY - Team Should Validate (We Provide Recommendation, You Approve)
+
+1. **{Risk ID}: {Title}** - {Recommendation + who should approve} (Sprint {N})
+2. **{Risk ID}: {Title}** - {Recommendation + who should approve} (Sprint {N})
+3. **{Risk ID}: {Title}** - {Recommendation + who should approve} (Sprint {N})
+
+**What we need from team:** Review recommendations and approve (or suggest changes).
+
+---
+
+### 📋 INFO ONLY - Solutions Provided (Review, No Decisions Needed)
+
+1. **Test strategy**: {Test level split} ({Rationale})
+2. **Tooling**: {Test frameworks and utilities}
+3. **Tiered CI/CD**: {Execution tiers with timing}
+4. **Coverage**: ~{N} test scenarios prioritized P0-P3 with risk-based classification
+5. **Quality gates**: {Pass criteria}
+
+**What we need from team:** Just review and acknowledge (we already have the solution).
+
+---
+
+## For Architects and Devs - Open Topics 👷
+
+### Risk Assessment
+
+**Total risks identified**: {N} ({X} high-priority score ≥6, {Y} medium, {Z} low)
+
+#### High-Priority Risks (Score ≥6) - IMMEDIATE ATTENTION
+
+| Risk ID | Category | Description | Probability | Impact | Score | Mitigation | Owner | Timeline |
+|---------|----------|-------------|-------------|--------|-------|------------|-------|----------|
+| **{R-ID}** | **{CAT}** | {Description} | {1-3} | {1-3} | **{Score}** | {Mitigation strategy} | {Owner} | {Date} |
+
+#### Medium-Priority Risks (Score 3-4)
+
+| Risk ID | Category | Description | Probability | Impact | Score | Mitigation | Owner |
+|---------|----------|-------------|-------------|--------|-------|------------|-------|
+| {R-ID} | {CAT} | {Description} | {1-3} | {1-3} | {Score} | {Mitigation} | {Owner} |
+
+#### Low-Priority Risks (Score 1-2)
+
+| Risk ID | Category | Description | Probability | Impact | Score | Action |
+|---------|----------|-------------|-------------|--------|-------|--------|
+| {R-ID} | {CAT} | {Description} | {1-3} | {1-3} | {Score} | Monitor |
+
+#### Risk Category Legend
+
+- **TECH**: Technical/Architecture (flaws, integration, scalability)
+- **SEC**: Security (access controls, auth, data exposure)
+- **PERF**: Performance (SLA violations, degradation, resource limits)
+- **DATA**: Data Integrity (loss, corruption, inconsistency)
+- **BUS**: Business Impact (UX harm, logic errors, revenue)
+- **OPS**: Operations (deployment, config, monitoring)
+
+---
+
+### Testability Concerns and Architectural Gaps
+
+**IMPORTANT**: {If system has constraints, explain them. If standard CI/CD achievable, state that.}
+
+#### Blockers to Fast Feedback
+
+| Blocker | Impact | Current Mitigation | Ideal Solution |
+|---------|--------|-------------------|----------------|
+| **{Blocker name}** | {Impact description} | {How we're working around it} | {What architecture should provide} |
+
+#### Why This Matters
+
+**Standard CI/CD expectations:**
+- Full test suite on every commit (~5-15 min feedback)
+- Parallel test execution (isolated test data per worker)
+- Ephemeral test environments (spin up → test → tear down)
+- Fast feedback loop (devs stay in flow state)
+
+**Current reality for {Feature}:**
+- {Actual situation - what's different from standard}
+
+#### Tiered Testing Strategy
+
+{If forced by architecture, explain. If standard approach works, state that.}
+
+| Tier | When | Duration | Coverage | Why Not Full Suite? |
+|------|------|----------|----------|---------------------|
+| **Smoke** | Every commit | <5 min | {N} tests | Fast feedback, catch build-breaking changes |
+| **P0** | Every commit | ~{X} min | ~{N} tests | Critical paths, security-critical flows |
+| **P1** | PR to main | ~{X} min | ~{N} tests | Important features, algorithm accuracy |
+| **P2/P3** | Nightly | ~{X} min | ~{N} tests | Edge cases, performance, NFR |
+
+**Note**: {Any timing assumptions or constraints}
+
+#### Architectural Improvements Needed
+
+{If system has technical debt affecting testing, list improvements. If architecture supports testing well, acknowledge that.}
+
+1. **{Improvement name}**
+ - {What to change}
+ - **Impact**: {How it improves testing}
+
+#### Acceptance of Trade-offs
+
+For {Feature} Phase 1, the team accepts:
+- **{Trade-off 1}** ({Reasoning})
+- **{Trade-off 2}** ({Reasoning})
+- ⚠️ **{Known limitation}** ({Why acceptable for now})
+
+This is {**technical debt** OR **acceptable for Phase 1**} that should be {revisited post-GA OR maintained as-is}.
+
+---
+
+### Risk Mitigation Plans (High-Priority Risks ≥6)
+
+**Purpose**: Detailed mitigation strategies for all {N} high-priority risks (score ≥6). These risks MUST be addressed before {GA launch date or milestone}.
+
+#### {R-ID}: {Risk Description} (Score: {Score}) - {CRITICALITY LEVEL}
+
+**Mitigation Strategy:**
+1. {Step 1}
+2. {Step 2}
+3. {Step 3}
+
+**Owner:** {Owner}
+**Timeline:** {Sprint or date}
+**Status:** Planned / In Progress / Complete
+**Verification:** {How to verify mitigation is effective}
+
+---
+
+{Repeat for all high-priority risks}
+
+---
+
+### Assumptions and Dependencies
+
+#### Assumptions
+
+1. {Assumption about architecture or requirements}
+2. {Assumption about team or timeline}
+3. {Assumption about scope or constraints}
+
+#### Dependencies
+
+1. {Dependency} - Required by {date/sprint}
+2. {Dependency} - Required by {date/sprint}
+
+#### Risks to Plan
+
+- **Risk**: {Risk to the test plan itself}
+ - **Impact**: {How it affects testing}
+ - **Contingency**: {Backup plan}
+
+---
+
+**End of Architecture Document**
+
+**Next Steps for Architecture Team:**
+1. Review Quick Guide (🚨/⚠️/📋) and prioritize blockers
+2. Assign owners and timelines for high-priority risks (≥6)
+3. Validate assumptions and dependencies
+4. Provide feedback to QA on testability gaps
+
+**Next Steps for QA Team:**
+1. Wait for Sprint 0 blockers to be resolved
+2. Refer to companion QA doc (test-design-qa.md) for test scenarios
+3. Begin test infrastructure setup (factories, fixtures, environments)
diff --git a/src/bmm/workflows/testarch/test-design/test-design-qa-template.md b/src/bmm/workflows/testarch/test-design/test-design-qa-template.md
new file mode 100644
index 00000000..f148dfc1
--- /dev/null
+++ b/src/bmm/workflows/testarch/test-design/test-design-qa-template.md
@@ -0,0 +1,315 @@
+# Test Design for QA: {Feature Name}
+
+**Purpose:** Test execution recipe for QA team. Defines test scenarios, coverage plan, tooling, and Sprint 0 setup requirements. Use this as your implementation guide after architectural blockers are resolved.
+
+**Date:** {date}
+**Author:** {author}
+**Status:** Draft / Ready for Implementation
+**Project:** {project_name}
+**PRD Reference:** {prd_link}
+**ADR Reference:** {adr_link}
+
+---
+
+## Quick Reference for QA
+
+**Before You Start:**
+- [ ] Review Architecture doc (test-design-architecture.md) - understand blockers and risks
+- [ ] Verify Sprint 0 blockers resolved (see Sprint 0 section below)
+- [ ] Confirm test infrastructure ready (factories, fixtures, environments)
+
+**Test Execution Order:**
+1. **Smoke tests** (<5 min) - Fast feedback on critical paths
+2. **P0 tests** (~{X} min) - Critical paths, security-critical flows
+3. **P1 tests** (~{X} min) - Important features, algorithm accuracy
+4. **P2/P3 tests** (~{X} min) - Edge cases, performance, NFR
+
+**Need Help?**
+- Blockers: See Architecture doc "Quick Guide" for mitigation plans
+- Test scenarios: See "Test Coverage Plan" section below
+- Sprint 0 setup: See "Sprint 0 Setup Requirements" section
+
+---
+
+## System Architecture Summary
+
+**Data Pipeline:**
+{Brief description of system flow}
+
+**Key Services:**
+- **{Service 1}**: {Purpose and key responsibilities}
+- **{Service 2}**: {Purpose and key responsibilities}
+- **{Service 3}**: {Purpose and key responsibilities}
+
+**Data Stores:**
+- **{Database 1}**: {What it stores}
+- **{Database 2}**: {What it stores}
+
+**Expected Scale** (from ADR):
+- {Key metrics: RPS, volume, users, etc.}
+
+---
+
+## Test Environment Requirements
+
+**{Company} Standard:** Shared DB per Environment with Randomization (Shift-Left)
+
+| Environment | Database | Test Data Strategy | Purpose |
+|-------------|----------|-------------------|---------|
+| **Local** | {DB} (shared) | Randomized (faker), auto-cleanup | Local development |
+| **Dev (CI)** | {DB} (shared) | Randomized (faker), auto-cleanup | PR validation |
+| **Staging** | {DB} (shared) | Randomized (faker), auto-cleanup | Pre-production, E2E |
+
+**Key Principles:**
+- **Shared database per environment** (no ephemeral)
+- **Randomization for isolation** (faker-based unique IDs)
+- **Parallel-safe** (concurrent test runs don't conflict)
+- **Self-cleaning** (tests delete their own data)
+- **Shift-left** (test against real DBs early)
+
+**Example:**
+
+```typescript
+import { faker } from "@faker-js/faker";
+
+test("example with randomized test data @p0", async ({ apiRequest }) => {
+ const testData = {
+ id: `test-${faker.string.uuid()}`,
+ customerId: `test-customer-${faker.string.alphanumeric(8)}`,
+ // ... unique test data
+ };
+
+ // Seed, test, cleanup
+});
+```
+
+---
+
+## Testability Assessment
+
+**Prerequisites from Architecture Doc:**
+
+Verify these blockers are resolved before test development:
+- [ ] {Blocker 1} (see Architecture doc Quick Guide → 🚨 BLOCKERS)
+- [ ] {Blocker 2}
+- [ ] {Blocker 3}
+
+**If Prerequisites Not Met:** Coordinate with Architecture team (see Architecture doc for mitigation plans and owner assignments)
+
+---
+
+## Test Levels Strategy
+
+**System Type:** {API-heavy / UI-heavy / Mixed backend system}
+
+**Recommended Split:**
+- **Unit Tests: {X}%** - {What to unit test}
+- **Integration/API Tests: {X}%** - ⭐ **PRIMARY FOCUS** - {What to integration test}
+- **E2E Tests: {X}%** - {What to E2E test}
+
+**Rationale:** {Why this split makes sense for this system}
+
+**Test Count Summary:**
+- P0: ~{N} tests - Critical paths, run on every commit
+- P1: ~{N} tests - Important features, run on PR to main
+- P2: ~{N} tests - Edge cases, run nightly/weekly
+- P3: ~{N} tests - Exploratory, run on-demand
+- **Total: ~{N} tests** (~{X} weeks for 1 QA, ~{Y} weeks for 2 QAs)
+
+---
+
+## Test Coverage Plan
+
+**Repository Note:** {Where tests live - backend repo, admin panel repo, etc. - and how CI pipelines are organized}
+
+### P0 (Critical) - Run on every commit (~{X} min)
+
+**Execution:** CI/CD on every commit, parallel workers, smoke tests first (<5 min)
+
+**Purpose:** Critical path validation - catch build-breaking changes and security violations immediately
+
+**Criteria:** Blocks core functionality OR High risk (≥6) OR No workaround
+
+**Key Smoke Tests** (subset of P0, run first for fast feedback):
+- {Smoke test 1} - {Duration}
+- {Smoke test 2} - {Duration}
+- {Smoke test 3} - {Duration}
+
+| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
+|-------------|------------|-----------|------------|-------|-------|
+| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
+| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
+
+**Total P0:** ~{N} tests (~{X} weeks)
+
+#### P0 Test Scenarios (Detailed)
+
+**1. {Test Category} ({N} tests) - {CRITICALITY if applicable}**
+
+- [ ] {Scenario 1 with checkbox}
+- [ ] {Scenario 2}
+- [ ] {Scenario 3}
+
+**2. {Test Category 2} ({N} tests)**
+
+- [ ] {Scenario 1}
+- [ ] {Scenario 2}
+
+{Continue for all P0 categories}
+
+---
+
+### P1 (High) - Run on PR to main (~{X} min additional)
+
+**Execution:** CI/CD on pull requests to main branch, runs after P0 passes, parallel workers
+
+**Purpose:** Important feature coverage - algorithm accuracy, complex workflows, Admin Panel interactions
+
+**Criteria:** Important features OR Medium risk (3-4) OR Common workflows
+
+| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
+|-------------|------------|-----------|------------|-------|-------|
+| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
+| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
+
+**Total P1:** ~{N} tests (~{X} weeks)
+
+#### P1 Test Scenarios (Detailed)
+
+**1. {Test Category} ({N} tests)**
+
+- [ ] {Scenario 1}
+- [ ] {Scenario 2}
+
+{Continue for all P1 categories}
+
+---
+
+### P2 (Medium) - Run nightly/weekly (~{X} min)
+
+**Execution:** Scheduled nightly run (or weekly for P3), full infrastructure, sequential execution acceptable
+
+**Purpose:** Edge case coverage, error handling, data integrity validation - slow feedback acceptable
+
+**Criteria:** Secondary features OR Low risk (1-2) OR Edge cases
+
+| Requirement | Test Level | Risk Link | Test Count | Owner | Notes |
+|-------------|------------|-----------|------------|-------|-------|
+| {Requirement 1} | {Level} | {R-ID} | {N} | QA | {Notes} |
+| {Requirement 2} | {Level} | {R-ID} | {N} | QA | {Notes} |
+
+**Total P2:** ~{N} tests (~{X} weeks)
+
+---
+
+### P3 (Low) - Run on-demand (exploratory)
+
+**Execution:** Manual trigger or weekly scheduled run, performance testing
+
+**Purpose:** Full regression, performance benchmarks, accessibility validation - no time pressure
+
+**Criteria:** Nice-to-have OR Exploratory OR Performance benchmarks
+
+| Requirement | Test Level | Test Count | Owner | Notes |
+|-------------|------------|------------|-------|-------|
+| {Requirement 1} | {Level} | {N} | QA | {Notes} |
+| {Requirement 2} | {Level} | {N} | QA | {Notes} |
+
+**Total P3:** ~{N} tests (~{X} days)
+
+---
+
+### Coverage Matrix (Requirements → Tests)
+
+| Requirement | Test Level | Priority | Risk Link | Test Count | Owner |
+|-------------|------------|----------|-----------|------------|-------|
+| {Requirement 1} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
+| {Requirement 2} | {Level} | {P0-P3} | {R-ID} | {N} | {Owner} |
+
+---
+
+## Sprint 0 Setup Requirements
+
+**IMPORTANT:** These items **BLOCK test development**. Complete in Sprint 0 before QA can write tests.
+
+### Architecture/Backend Blockers (from Architecture doc)
+
+**Source:** See Architecture doc "Quick Guide" for detailed mitigation plans
+
+1. **{Blocker 1}** 🚨 **BLOCKER** - {Owner}
+ - {What needs to be provided}
+ - **Details:** Architecture doc {Risk-ID} mitigation plan
+
+2. **{Blocker 2}** 🚨 **BLOCKER** - {Owner}
+ - {What needs to be provided}
+ - **Details:** Architecture doc {Risk-ID} mitigation plan
+
+### QA Test Infrastructure
+
+1. **{Factory/Fixture Name}** - QA
+ - Faker-based generator: `{function_signature}`
+ - Auto-cleanup after tests
+
+2. **{Entity} Fixtures** - QA
+ - Seed scripts for {states/scenarios}
+ - Isolated {id_pattern} per test
+
+### Test Environments
+
+**Local:** {Setup details - Docker, LocalStack, etc.}
+
+**CI/CD:** {Setup details - shared infrastructure, parallel workers, artifacts}
+
+**Staging:** {Setup details - shared multi-tenant, nightly E2E}
+
+**Production:** {Setup details - feature flags, canary transactions}
+
+**Sprint 0 NFR Gates** (MUST complete before integration testing):
+- [ ] {Gate 1}: {Description} (Owner) 🚨
+- [ ] {Gate 2}: {Description} (Owner) 🚨
+- [ ] {Gate 3}: {Description} (Owner) 🚨
+
+### Sprint 1 Items (Not Sprint 0)
+
+- **{Item 1}** ({Owner}): {Description}
+- **{Item 2}** ({Owner}): {Description}
+
+**Sprint 1 NFR Gates** (MUST complete before GA):
+- [ ] {Gate 1}: {Description} (Owner)
+- [ ] {Gate 2}: {Description} (Owner)
+
+---
+
+## NFR Readiness Summary
+
+**Based on Architecture Doc Risk Assessment**
+
+| NFR Category | Status | Evidence Status | Blocker | Next Action |
+|--------------|--------|-----------------|---------|-------------|
+| **Security** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Performance** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Reliability** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Data Integrity** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Scalability** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Disaster Recovery** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Monitorability** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Deployability** | {Status} | {Evidence} | {Sprint} | {Action} |
+| **Maintainability** | PASS | Test design complete (~{N} scenarios) | None | Proceed with implementation |
+
+**Total:** {N} PASS, {N} CONCERNS across {N} categories
+
+---
+
+**End of QA Document**
+
+**Next Steps for QA Team:**
+1. Verify Sprint 0 blockers resolved (coordinate with Architecture team if not)
+2. Set up test infrastructure (factories, fixtures, environments)
+3. Begin test implementation following priority order (P0 → P1 → P2 → P3)
+4. Run smoke tests first for fast feedback
+5. Track progress using test scenario checklists above
+
+**Next Steps for Architecture Team:**
+1. Monitor Sprint 0 blocker resolution
+2. Provide support for QA infrastructure setup if needed
+3. Review test results and address any newly discovered testability gaps
diff --git a/src/bmm/workflows/testarch/test-design/workflow.yaml b/src/bmm/workflows/testarch/test-design/workflow.yaml
index b5fbd661..961eff34 100644
--- a/src/bmm/workflows/testarch/test-design/workflow.yaml
+++ b/src/bmm/workflows/testarch/test-design/workflow.yaml
@@ -15,6 +15,9 @@ date: system-generated
installed_path: "{project-root}/_bmad/bmm/workflows/testarch/test-design"
instructions: "{installed_path}/instructions.md"
validation: "{installed_path}/checklist.md"
+# Note: Template selection is mode-based (see instructions.md Step 1.5):
+# - System-level: test-design-architecture-template.md + test-design-qa-template.md
+# - Epic-level: test-design-template.md (unchanged)
template: "{installed_path}/test-design-template.md"
# Variables and inputs
@@ -26,13 +29,25 @@ variables:
# Note: Actual output file determined dynamically based on mode detection
# Declared outputs for new workflow format
outputs:
- - id: system-level
- description: "System-level testability review (Phase 3)"
- path: "{output_folder}/test-design-system.md"
+ # System-Level Mode (Phase 3) - TWO documents
+ - id: test-design-architecture
+ description: "System-level test architecture: Architectural concerns, testability gaps, NFR requirements for Architecture/Dev teams"
+ path: "{output_folder}/test-design-architecture.md"
+ mode: system-level
+ audience: architecture
+
+ - id: test-design-qa
+ description: "System-level test design: Test execution recipe, coverage plan, Sprint 0 setup for QA team"
+ path: "{output_folder}/test-design-qa.md"
+ mode: system-level
+ audience: qa
+
+ # Epic-Level Mode (Phase 4) - ONE document (unchanged)
- id: epic-level
description: "Epic-level test plan (Phase 4)"
path: "{output_folder}/test-design-epic-{epic_num}.md"
-default_output_file: "{output_folder}/test-design-epic-{epic_num}.md"
+ mode: epic-level
+# Note: No default_output_file - mode detection determines which outputs to write
# Required tools
required_tools: