feat(skills): add discovery rigor and workflow contract support

- add discovery rigor and memory manager skill flows plus supporting resources - add workflow contract skill scaffolding and update related module metadata - tighten validation, installation coverage, and local ignore rules for contribution hygiene
2026-04-09 13:44:40 -04:00 · 2026-04-09 13:44:40 -04:00 · a6e89e2557
parent 7f7690dbfd
commit a6e89e2557
53 changed files with 2412 additions and 156 deletions
--- a/.gitignore
+++ b/.gitignore
@ -26,6 +26,9 @@ design-artifacts/
 __pycache__/
 .pytest_cache/

+# Local history
+.history/
+
 # System files
 .DS_Store
 Thumbs.db
--- a/.markdownlint-cli2.yaml
+++ b/.markdownlint-cli2.yaml
@ -12,6 +12,7 @@ ignores:
  - .roo/**
  - .codex/**
  - .kiro/**
+  - .history/**
  - sample-project/**
  - test-project-install/**
  - z*/**
--- a/eslint.config.mjs
+++ b/eslint.config.mjs
@ -39,6 +39,7 @@ export default [
      // are dictated by Augment and can't be changed, so exclude
      // the entire directory from linting
      '.augment/**',
+      '.history/**',
    ],
  },

--- a/src/bmm-skills/3-solutioning/bmad-create-epics-and-stories/steps/step-03-create-stories.md
+++ b/src/bmm-skills/3-solutioning/bmad-create-epics-and-stories/steps/step-03-create-stories.md
@ -144,6 +144,8 @@ For each story in the epic:
 1. **Story Title**: Clear, action-oriented
 2. **User Story**: Complete the As a/I want/So that format
 3. **Acceptance Criteria**: Write specific, testable criteria
+4. **Exit Criteria**: What must be true for this story to be considered done — stated as verifiable assertions, not aspirations
+5. **Rollback Boundary**: If this story fails after merge, what is the safe rollback point? (e.g., "revert to pre-story state — no schema migrations" or "feature flag off — no data loss")

 **AC Writing Guidelines:**

--- a/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/SKILL.md
+++ b/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/SKILL.md
@ -0,0 +1,38 @@
+---
+name: bmad-create-workflow-contract
+description: "Defines workflow and operator contracts. Use when the user says 'create a workflow contract', 'define the operator contract', or 'formalize the workflow interface'."
+---
+
+# Workflow Contract
+
+## Overview
+
+This skill helps you formalize boundaries between systems, repos, or workflow stages through a canonical contract document. It works by surfacing contract boundaries, defining each approved surface with inline compliance questions, and checking the finished contract for cross-surface consistency. Your output is a frozen workflow contract that downstream teams and skills can treat as the authoritative interface.
+
+Follow the instructions in [workflow.md](workflow.md).
+
+## Your Approach
+
+- Make every boundary explicit enough that a new operator or repo owner can follow it without tribal knowledge.
+- Keep producer, consumer, and owner responsibilities visible at each surface.
+- Write compliance questions that can be answered with evidence, not opinion.
+- Freeze approved contract sections unless the user explicitly reopens them.
+
+## Deliverable
+
+The output document (`{planning_artifacts}/workflow-contract.md`) should leave downstream work with:
+
+- **Systems in Scope** — systems, repos, or stages with roles and owners
+- **Per-Surface Contracts** — sections driven by `./resources/contract-surface-types.csv`
+- **Compliance Questions** — inline, evidence-ready checks for each confirmed contract
+- **Boundaries and Constraints** — Always / Ask First / Never rules for operators and maintainers
+
+## Recovery
+
+If conversation context is compressed, re-read this file and [workflow.md](workflow.md). The output document frontmatter (`stepsCompleted`, `lastStep`, `mode`) is the recovery source.
+
+## On Activation
+
+- Load config and the contract surface taxonomy.
+- Discover candidate input documents, then confirm scope before loading them in full.
+- Begin with [workflow.md](workflow.md), then route into `./steps/step-01-init.md`.
--- a/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/bmad-skill-manifest.yaml
+++ b/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/bmad-skill-manifest.yaml
@ -0,0 +1 @@
+type: skill
--- a/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/resources/contract-surface-types.csv
+++ b/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/resources/contract-surface-types.csv
@ -0,0 +1,7 @@
+surface_type,what_to_define,table_columns,when_applicable,definition_order
+identity,"How entities are named across boundaries — IDs, aliases, canonical names, paths, manifests","Entity | Canonical Name | Aliases | Resolution Rule | Collision Rule",always,1
+ownership,"Who produces, who consumes, and who owns each concern — including dispute resolution","Concern | Producer | Consumer(s) | Owner | Handoff Point",always,2
+operator,"Commands, config paths, state transitions, proof mode, and rollback procedures","Action | Command | Config Path | Inputs | Outputs | Idempotent?",always,3
+evidence,"What proves correctness, where evidence lands, and how review consumes it","Checkpoint | Evidence Artifact | Location | Format | Reviewer",always,4
+compatibility,"Backward compatibility rules, versioning, drift detection, and breaking change process","N/A — use prose fields: Versioning scheme, Backward compat rule, Drift detection, Breaking change process",when_multiple_versions,5
+migration,"Transitional period, aliases, removal signals, cleanup procedures","N/A — use prose fields: Transitional period, Transitional aliases, Removal signal, Cleanup",when_migrating,6
--- a/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/steps/step-01-init.md
+++ b/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/steps/step-01-init.md
@ -0,0 +1,75 @@
+# Step 1: Initialization
+
+Establish whether this contract is grounded in enough upstream context, identify the systems and boundaries in scope, and choose the approval cadence that fits the number of surfaces to define.
+
+## Recovery
+
+If `{outputFile}` exists, recover per workflow.md §RECOVERY PROTOCOL.
+
+## Gather Inputs
+
+Search in `{planning_artifacts}/**`, `{output_folder}/**`, and `{project_knowledge}/**` when that path is configured:
+
+| Document Type | Glob Pattern | Priority |
+|--------------|-------------|----------|
+| Discovery Context | `*discovery-context*.md` | Primary — may already contain Contract Candidates |
+| Architecture | `*architecture*.md` | High — system boundaries and technical decisions |
+| PRD | `*prd*.md` | Medium — requirements and constraints |
+| Technical Design | `*tech-spec*.md`, `*technical-design*.md` | Medium — implementation details and tradeoffs |
+| Epics / Stories | `*epic*.md`, `*stor*.md` | Low — implementation slices |
+
+Before loading document contents, present the discovered candidates and use `vscode_askQuestions` to confirm which documents are in scope. In autonomous mode, self-serve the scope selection from workspace evidence and log it.
+
+For sharded folders, load `index.md` first and then the relevant shards from the selected documents.
+
+Use these loading rules:
+
+- Load every discovered file the user confirms is in scope.
+- Treat Discovery Context as the strongest seed source; check it for a **Contract Candidates** section.
+- Track loaded files in frontmatter `inputDocuments`.
+
+Minimum input check:
+
+| Condition | Action |
+|-----------|--------|
+| At least one of Discovery Context, Architecture, or Technical Design loaded and confirmed in scope | Proceed |
+| None found | Halt: `A Workflow Contract needs upstream context. Please run bmad-discovery-rigor, bmad-create-architecture, or provide a technical design first.` |
+
+From the loaded documents, extract:
+
+- **Contract Candidates** from Discovery Context when present
+- **Systems, repos, or services** named across the source material
+- **Boundaries** between components or stages
+- **Identity schemes** such as IDs, aliases, paths, or manifest names
+
+Build a preliminary systems map:
+
+| System / Repo | Role (Producer / Consumer / Both) | Repo / Location | Owner |
+|--------------|-----------------------------------|-----------------|-------|
+
+## Select Mode and Initialize Output
+
+Load `../resources/contract-surface-types.csv` and count how many contract surfaces need definition based on the seeded candidates, systems, and boundaries.
+
+| Surface count | Mode | Behavior |
+|--------------|------|----------|
+| ≤ 3 | Lightweight | Single halt gate per step; compact presentation |
+| > 3 | Full | Per-surface halt gates; batch presentation |
+
+Copy `../workflow-contract-template.md` to `{outputFile}` and update frontmatter:
+
+- Replace `{{project_name}}`, `{{user_name}}`, and `{{date}}` placeholders in the copied template before writing it.
+- Render the document title with the resolved project name so the initialized file contains no leftover template placeholders.
+- Write the preliminary systems map into the `## Systems in Scope` table immediately so recovery and later verification use saved state rather than reconstructed notes.
+
+```yaml
+stepsCompleted: [1]
+inputDocuments: [list of loaded files]
+mode: '[lightweight or full]'
+surfaceCount: [count]
+lastStep: 'step-01-init'
+```
+
+Present the documents loaded, contract candidates seeded, systems identified, and selected mode with surface count.
+
+**🛑 HALT — Use `vscode_askQuestions` to confirm the selected scope and mode before proceeding. In autonomous mode, self-serve and log the decision.**
--- a/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/steps/step-02-define.md
+++ b/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/steps/step-02-define.md
@ -0,0 +1,112 @@
+# Step 2: Contract Surface Discovery and Definition
+
+Turn the systems map into explicit contract surfaces, let the user triage which ones belong in this document, then define each included surface with evidence-ready compliance questions. Mark each confirmed section frozen with `<!-- frozen-after-approval -->`.
+
+## Recovery
+
+Recover per workflow.md §RECOVERY PROTOCOL. If context was compressed, announce the recovered state before proceeding.
+
+## Enumerate and Triage Surfaces
+
+Load `../resources/contract-surface-types.csv`. For each system boundary from step 1, probe each surface type with this taxonomy:
+
+| Surface Type | Probe Question |
+|-------------|---------------|
+| identity | How are entities named across this boundary? Stable IDs or context-dependent? |
+| ownership | Who produces, consumes, and owns each artifact or config? |
+| operator | What commands does an operator run? Which configs and state transitions matter? |
+| evidence | What proves correctness, and where does that evidence land? |
+| compatibility | What backward-compatibility rules apply? Versioning? Drift detection? |
+| migration | Is there a transition period, and do old and new systems coexist? |
+
+Apply `when_applicable` from the CSV with this filter:
+
+| when_applicable value | Include? |
+|----------------------|----------|
+| always | Yes — probe for every boundary |
+| when_multiple_versions | Only if versioning or parallel versions exist |
+| when_migrating | Only if a migration is in progress or planned |
+
+Zero-surface guard:
+
+| Surfaces found | Action |
+|----------------|--------|
+| 0 | Halt: `No contract surfaces were discovered from the input documents. This usually means either the systems in scope do not have cross-boundary contracts to define, or the source material does not describe boundaries clearly enough. Consider invoking bmad-discovery-rigor before returning here.` |
+| ≥ 1 | Proceed to presentation |
+
+Present discovered surfaces in a table:
+
+| # | Boundary | Surface Type | Source | Status |
+|---|----------|-------------|--------|--------|
+| 1 | [system → system] | [type] | Seeded / New | [pending] |
+
+Ask the user to triage each surface as:
+
+- **Include** — define it formally in this contract
+- **Defer** — acknowledge it but leave it out of this document
+- **N/A** — the boundary or surface does not actually apply
+
+**🛑 HALT — Use `vscode_askQuestions` to collect Include / Defer / N/A decisions. In autonomous mode, self-serve from workspace evidence and log the triage.**
+
+## Draft and Confirm Contracts
+
+Work through included surfaces in the CSV `definition_order` (identity → ownership → operator → evidence → compatibility → migration).
+
+For each included surface:
+
+- Draft the contract using the CSV `table_columns`
+- Add 2-3 inline compliance questions that can be answered with evidence and cover the happy path plus at least one edge case
+- Make those compliance questions specific to the actual contract content, not generic templates
+- Highlight any places where the input documents were silent or contradictory
+- Present the draft for confirmation
+
+Mode-dependent presentation:
+
+| Mode | Presentation |
+|------|-------------|
+| Lightweight (≤3 surfaces) | Present all drafted contracts at once, then single halt |
+| Full (>3 surfaces) | Present one contract at a time, halt after each |
+
+**🛑 HALT — Use `vscode_askQuestions` for contract confirmation per the mode rules above. In autonomous mode, self-serve from workspace evidence and log the result.**
+
+On confirmation, mark the section frozen with `<!-- frozen-after-approval -->`.
+
+## Verify Consistency and Update Document
+
+After all contracts are defined, verify consistency across them:
+
+| Check | What to verify |
+|-------|---------------|
+| Identity ↔ Operator | Do operator commands use the canonical names from Identity? |
+| Ownership ↔ Operator | Does the owning entity also run or authorize the operator commands it owns? |
+| Operator ↔ Evidence | Do operator outputs land where Evidence expects to find them? |
+| Evidence ↔ Compatibility | Does drift detection inspect the locations where evidence is produced? |
+
+If inconsistencies appear, present them and ask the user to resolve them with `vscode_askQuestions` before continuing. In autonomous mode, self-serve the most defensible resolution from workspace evidence and log it.
+
+Write all confirmed contract sections and their inline compliance questions to `{outputFile}`.
+
+Populate the **Boundaries and Constraints** table:
+
+| Category | Rule |
+|----------|------|
+| Always | [rules that must always hold — from confirmed contracts] |
+| Ask First | [rules that need human judgment case by case] |
+| Never | [hard prohibitions — from contracts and constraints] |
+
+Update frontmatter:
+
+```yaml
+stepsCompleted: [1, 2]
+lastStep: 'step-02-define'
+```
+
+Present:
+
+```markdown
+**All contracts defined.** {N} contract sections frozen with {M} compliance questions.
+
+[C] Continue to finalization
+```
+
+**🛑 HALT — Use `vscode_askQuestions` to confirm continuation into finalization. In autonomous mode, self-serve and log the decision.**
--- a/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/steps/step-03-finalize.md
+++ b/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/steps/step-03-finalize.md
@ -0,0 +1,57 @@
+# Step 3: Finalization and Handoff
+
+Verify that the contract is complete enough to trust, run any final challenge pass that adds value, then save the contract and recommend the next workflow.
+
+## Recovery
+
+Check `{outputFile}` frontmatter per workflow.md step sequence. If context was compressed, recover and announce the recovered state.
+
+## Verify the Contract
+
+Verify each item against `{outputFile}`:
+
+| # | Check | How to verify |
+|---|-------|--------------|
+| 1 | Systems in Scope populated | At least one system row with role and owner |
+| 2 | At least one contract section defined | Non-empty contract content exists |
+| 3 | Each contract section has frozen marker | `<!-- frozen-after-approval -->` present |
+| 4 | Inline compliance questions present | Each contract has at least one compliance question |
+| 5 | Boundaries and Constraints populated | At least one row in each category |
+| 6 | No placeholder text remaining | No `[TODO]`, `[TBD]`, or `{{...}}` anywhere in the document, including frontmatter and title |
+| 7 | Cross-contract consistency verified | Step 2 consistency check was completed |
+
+If any check fails, state which one failed, explain why, and ask how to proceed.
+
+**🛑 HALT if any check fails — use `vscode_askQuestions` to resolve or explicitly defer the failure before finalizing. In autonomous mode, self-serve and log the decision.**
+
+## Optional Adversarial Review
+
+If `bmad-review-adversarial-general` is available, invoke it against the compiled contract to look for unstated assumptions, missing boundaries, ambiguous ownership, or unverifiable claims.
+
+If findings are returned, present them to the user, use `vscode_askQuestions` to resolve or explicitly defer them, and do not proceed to save until that gate is complete. In autonomous mode, self-serve the most defensible resolution from workspace evidence and log it.
+
+Skip this pass if the skill is not installed or the contract is straightforward.
+
+## Save and Recommend Next Steps
+
+Update `{outputFile}` frontmatter:
+
+```yaml
+stepsCompleted: [1, 2, 3]
+status: 'complete'
+completedDate: '{date}'
+lastStep: 'step-03-finalize'
+```
+
+Recommend the next workflow based on contract content:
+
+| Contract content signals | Recommended skill | Reason |
+|-------------------------|-------------------|--------|
+| Implementation slices in input docs | `bmad-sprint-planning` | Plan implementation of the contract |
+| Architecture decisions still needed | `bmad-create-architecture` | Design the system that fulfills the contract |
+| Story creation still needed | `bmad-create-epics-and-stories` | Break the contract into implementation stories |
+| Came directly from discovery | Note the discovery lineage | Contract is now ready for the next phase |
+
+Present a completion summary with systems in scope, contracts defined, compliance question count, boundary rules, saved location, and recommended next step. Note that downstream skills should load this contract as reference.
+
+**🛑 Workflow contract workflow complete.**
--- a/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/workflow-contract-template.md
+++ b/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/workflow-contract-template.md
@ -0,0 +1,32 @@
+---
+stepsCompleted: []
+inputDocuments: []
+mode: ''
+surfaceCount: 0
+workflowType: 'workflow-contract'
+project_name: '{{project_name}}'
+user_name: '{{user_name}}'
+date: '{{date}}'
+lastStep: ''
+---
+
+# Workflow Contract — {{project_name}}
+
+_Canonical contract surfaces for cross-system or cross-repo workflow integration. Built collaboratively through step-by-step discovery. Contract types driven by `contract-surface-types.csv`._
+
+## Systems in Scope
+
+| System | Role (Producer / Consumer / Both) | Repo / Location | Owner |
+|--------|-----------------------------------|-----------------|-------|
+
+## Contracts
+
+_Each contract section includes its compliance questions inline._
+
+## Boundaries and Constraints
+
+| Category | Rule |
+|----------|------|
+| Always | |
+| Ask First | |
+| Never | |
--- a/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/workflow.md
+++ b/src/bmm-skills/3-solutioning/bmad-create-workflow-contract/workflow.md
@ -0,0 +1,52 @@
+---
+outputFile: '{planning_artifacts}/workflow-contract.md'
+---
+
+# Workflow Contract Workflow
+
+This workflow turns discovery, architecture, or design context into an explicit workflow or operator contract. Use it when interface drift, cross-repo coordination, or migration risk make implicit knowledge too expensive.
+
+Keep each step self-contained so the workflow can recover cleanly from context compression. Mode selection changes approval cadence, not contract quality: lightweight mode batches decisions; full mode slows down to protect boundary accuracy.
+
+## Step Sequence
+
+| Step | File | Purpose | Conditional |
+|------|------|---------|-------------|
+| 1 | `step-01-init.md` | Gather grounded inputs, detect existing work, and select mode | No |
+| 2 | `step-02-define.md` | Discover surfaces, define contracts, and add inline compliance questions | No |
+| 3 | `step-03-finalize.md` | Verify completeness, save the contract, and recommend next steps | No |
+
+## Recovery Protocol
+
+| Condition | Action |
+|-----------|--------|
+| `{outputFile}` missing | Fresh run — start at step 1 |
+| `{outputFile}` exists with `status: 'complete'` | Ask whether to reuse or rerun |
+| `stepsCompleted: [1]` | Reconstruct `inputDocuments`, `mode`, `surfaceCount`, and the current systems map, then resume at step 2 |
+| `stepsCompleted: [1, 2]` | Reconstruct confirmed contract sections, frozen markers, and Boundaries and Constraints, then resume at step 3 |
+| Frontmatter and document body conflict | Present the mismatch, then ask whether to repair or restart |
+
+### Mode Selection (determined in Step 1)
+
+| Condition | Mode | Behavior |
+|-----------|------|----------|
+| ≤ 3 contract surfaces to define | Lightweight | Single halt gate per step; surfaces + contracts in one pass |
+| > 3 contract surfaces to define | Full | Per-surface halt gates; batch presentation |
+
+## INITIALIZATION
+
+### Configuration Loading
+
+Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
+
+- `project_name`, `output_folder`, `planning_artifacts`, `project_knowledge`, `user_name`
+- `communication_language`, `document_output_language`
+- `date` as system-generated current datetime
+
+### Resource Loading
+
+Load `./resources/contract-surface-types.csv` — this drives which contract types are available and the definition order.
+
+## EXECUTION
+
+Read fully and follow: `./steps/step-01-init.md`
--- a/src/bmm-skills/4-implementation/bmad-code-review/steps/step-04-present.md
+++ b/src/bmm-skills/4-implementation/bmad-code-review/steps/step-04-present.md
@ -46,6 +46,19 @@ If `decision_needed` findings exist, present each one with its detail and the op

 If the user chooses to defer, ask: Quick one-line reason for deferring this item? (helps future reviews): — then append that reason to both the story file bullet and the `{deferred_work_file}` entry.

+**Deferred-item triage guidance** — a finding should be deferred (not patched) when:
+
+- It is pre-existing (not introduced by this change)
+- It requires a design decision that is out of scope for this story
+- Fixing it would expand scope beyond the story's acceptance criteria
+- It needs input from a different owner or team
+
+A finding should NOT be deferred when:
+
+- It was introduced by this change (patch or reject instead)
+- It directly blocks an acceptance criterion
+- It is a security vulnerability in the changed code
+
 **HALT** — I am waiting for your numbered choice. Reply with only the number (or "0" for batch). Do not proceed until you select an option.

 ### 5. Handle `patch` findings
--- a/src/bmm-skills/4-implementation/bmad-create-story/template.md
+++ b/src/bmm-skills/4-implementation/bmad-create-story/template.md
@ -2,7 +2,9 @@

 Status: ready-for-dev

-<!-- Note: Validation is optional. Run validate-create-story for quality check before dev-story. -->
+<!-- Note: Validation is optional. Review the story for quality before running dev-story. -->
+
+<frozen-after-approval reason="human-owned intent — do not modify unless human renegotiates">

 ## Story

@ -14,6 +16,16 @@ so that {{benefit}}.

 1. [Add acceptance criteria from epics/PRD]

+## Exit Criteria
+
+- [Verifiable assertion that proves this story is done — e.g., "API returns 200 for authenticated user"]
+
+## Rollback Boundary
+
+- [Safe rollback point if this story fails after merge — e.g., "revert commit; no schema migrations"]
+
+</frozen-after-approval>
+
 ## Tasks / Subtasks

 - [ ] Task 1 (AC: #)
@ -42,8 +54,12 @@ so that {{benefit}}.

 {{agent_model_name_version}}

-### Debug Log References
+### Implementation Plan

-### Completion Notes List
+### Debug Log
+
+### Completion Notes

 ### File List
+
+### Change Log
--- a/src/bmm-skills/4-implementation/bmad-dev-story/checklist.md
+++ b/src/bmm-skills/4-implementation/bmad-dev-story/checklist.md
@ -52,7 +52,7 @@ validation-rules:
 - [ ] **File List Complete:** File List includes EVERY new, modified, or deleted file (paths relative to repo root)
 - [ ] **Dev Agent Record Updated:** Contains relevant Implementation Notes and/or Debug Log for this work
 - [ ] **Change Log Updated:** Change Log includes clear summary of what changed and why
- [ ] **Review Follow-ups:** All review follow-up tasks (marked [AI-Review]) completed and corresponding review items marked resolved (if applicable)
+- [ ] **Review Follow-ups:** All unresolved review findings or review follow-up tasks are completed and corresponding review items are marked resolved (if applicable)
 - [ ] **Story Structure Compliance:** Only permitted sections of story file were modified

 ## 🔚 Final Status Verification
--- a/src/bmm-skills/4-implementation/bmad-dev-story/workflow.md
+++ b/src/bmm-skills/4-implementation/bmad-dev-story/workflow.md
@ -5,10 +5,11 @@
 **Your Role:** Developer implementing the story.
 - Communicate all responses in {communication_language} and language MUST be tailored to {user_skill_level}
 - Generate all documents in {document_output_language}
- Only modify the story file in these areas: Tasks/Subtasks checkboxes, Dev Agent Record (Debug Log, Completion Notes), File List, Change Log, and Status
+- Only modify the story file in these areas: Tasks/Subtasks checkboxes, Dev Agent Record (Implementation Plan, Debug Log, Completion Notes), File List, Change Log, Status, and existing Senior Developer Review (AI) action-item checkboxes when resolving review continuation work
+- Content inside `<frozen-after-approval>` blocks (Story statement, Acceptance Criteria, Exit Criteria, Rollback Boundary) is immutable — if implementation cannot satisfy a frozen requirement, HALT and escalate to the user rather than modifying it
 - Execute ALL steps in exact order; do NOT skip steps
 - Absolutely DO NOT stop because of "milestones", "significant progress", or "session boundaries". Continue in a single execution until the story is COMPLETE (all ACs satisfied and all tasks/subtasks checked) UNLESS a HALT condition is triggered or the USER gives other instruction.
- Do NOT schedule a "next session" or request review pauses unless a HALT condition applies. Only Step 6 decides completion.
+- Do NOT schedule a "next session" or request review pauses unless a HALT condition applies. Only Step 9 decides story completion.
 - User skill level ({user_skill_level}) affects conversation style ONLY, not code updates.

 ---
@ -27,7 +28,7 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:

 ### Paths

- `story_file` = `` (explicit story path; auto-discovered if empty)
+- `story_path` = `` (explicit story path; auto-discovered if empty)
 - `sprint_status` = `{implementation_artifacts}/sprint-status.yaml`

 ### Context
@ -41,13 +42,15 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
 <workflow>
  <critical>Communicate all responses in {communication_language} and language MUST be tailored to {user_skill_level}</critical>
  <critical>Generate all documents in {document_output_language}</critical>
-  <critical>Only modify the story file in these areas: Tasks/Subtasks checkboxes, Dev Agent Record (Debug Log, Completion Notes), File List,
-    Change Log, and Status</critical>
+  <critical>Only modify the story file in these areas: Tasks/Subtasks checkboxes, Dev Agent Record (Implementation Plan, Debug Log,
+    Completion Notes), File List, Change Log, Status, and existing Senior Developer Review (AI) action-item checkboxes when resolving
+    review continuation work</critical>
+  <critical>Content inside frozen-after-approval blocks is immutable — HALT and escalate to user if implementation cannot satisfy a frozen Acceptance Criterion, Exit Criterion, or Rollback Boundary</critical>
  <critical>Execute ALL steps in exact order; do NOT skip steps</critical>
  <critical>Absolutely DO NOT stop because of "milestones", "significant progress", or "session boundaries". Continue in a single execution
    until the story is COMPLETE (all ACs satisfied and all tasks/subtasks checked) UNLESS a HALT condition is triggered or the USER gives
    other instruction.</critical>
-  <critical>Do NOT schedule a "next session" or request review pauses unless a HALT condition applies. Only Step 6 decides completion.</critical>
+  <critical>Do NOT schedule a "next session" or request review pauses unless a HALT condition applies. Only Step 9 decides story completion.</critical>
  <critical>User skill level ({user_skill_level}) affects conversation style ONLY, not code updates.</critical>

  <step n="1" goal="Find next ready story and load it" tag="sprint-status">
@ -66,6 +69,12 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
      <action>Parse the development_status section completely to understand story order</action>

      <action>Find the FIRST story (by reading in order from top to bottom) where:
+        - Key matches pattern: number-number-name (e.g., "1-2-user-auth")
+        - NOT an epic key (epic-X) or retrospective (epic-X-retrospective)
+        - Status value equals "in-progress"
+      </action>
+
+      <action if="no in-progress story found">Find the FIRST story (by reading in order from top to bottom) where:
        - Key matches pattern: number-number-name (e.g., "1-2-user-auth")
        - NOT an epic key (epic-X) or retrospective (epic-X-retrospective)
        - Status value equals "ready-for-dev"
@ -78,12 +87,11 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:

          **What would you like to do?**
          1. Run `create-story` to create next story from epics with comprehensive context
-          2. Run `*validate-create-story` to improve existing stories before development (recommended quality check)
+          2. Review and improve an existing story before development (recommended quality check)
          3. Specify a particular story file to develop (provide full path)
          4. Check {{sprint_status}} file to see current sprint status

-          💡 **Tip:** Stories in `ready-for-dev` may not have been validated. Consider running `validate-create-story` first for a quality
-          check.
+          💡 **Tip:** Stories in `ready-for-dev` may still benefit from a quick story quality review before development.
        </output>
        <ask>Choose option [1], [2], [3], or [4], or specify story file path:</ask>

@ -92,7 +100,7 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
        </check>

        <check if="user chooses '2'">
-          <action>HALT - Run validate-create-story to improve existing stories</action>
+          <action>HALT - Review and improve the selected story before rerunning dev-story</action>
        </check>

        <check if="user chooses '3'">
@ -117,16 +125,16 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
    <!-- Non-sprint story discovery -->
    <check if="{{sprint_status}} file does NOT exist">
      <action>Search {implementation_artifacts} for stories directly</action>
-      <action>Find stories with "ready-for-dev" status in files</action>
+      <action>Find stories with "in-progress" or "ready-for-dev" status in files</action>
      <action>Look for story files matching pattern: *-*-*.md</action>
      <action>Read each candidate story file to check Status section</action>

-      <check if="no ready-for-dev stories found in story files">
-        <output>📋 No ready-for-dev stories found
+      <check if="no in-progress or ready-for-dev stories found in story files">
+        <output>📋 No in-progress or ready-for-dev stories found

          **Available Options:**
          1. Run `create-story` to create next story from epics with comprehensive context
-          2. Run `*validate-create-story` to improve existing stories
+          2. Review and improve an existing story before development
          3. Specify which story to develop
        </output>
        <ask>What would you like to do? Choose option [1], [2], or [3]:</ask>
@ -136,7 +144,7 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
        </check>

        <check if="user chooses '2'">
-          <action>HALT - Run validate-create-story to improve existing stories</action>
+          <action>HALT - Review and improve the selected story before rerunning dev-story</action>
        </check>

        <check if="user chooses '3'">
@ -146,7 +154,7 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
        </check>
      </check>

-      <check if="ready-for-dev story found in files">
+      <check if="in-progress or ready-for-dev story found in files">
        <action>Use discovered story file and extract story_key</action>
      </check>
    </check>
@ -157,7 +165,7 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:

    <anchor id="task_check" />

-    <action>Parse sections: Story, Acceptance Criteria, Tasks/Subtasks, Dev Notes, Dev Agent Record, File List, Change Log, Status</action>
+    <action>Parse sections: Story, Acceptance Criteria, Exit Criteria, Rollback Boundary, Tasks/Subtasks, Dev Notes, Dev Agent Record (including Implementation Plan), File List, Change Log, Status, Senior Developer Review (AI)</action>

    <action>Load comprehensive context from story file's Dev Notes section</action>
    <action>Extract developer guidance from Dev Notes: architecture requirements, previous learnings, technical specifications</action>
@ -165,9 +173,8 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:

    <action>Identify first incomplete task (unchecked [ ]) in Tasks/Subtasks</action>

-    <action if="no incomplete tasks">
-      <goto step="6">Completion sequence</goto>
-    </action>
+    <action if="no incomplete tasks">Set {{all_tasks_complete}} = true</action>
+    <action if="incomplete tasks remain">Set {{all_tasks_complete}} = false</action>
    <action if="story file inaccessible">HALT: "Cannot develop story without access to story file"</action>
    <action if="incomplete task or subtask requirements ambiguous">ASK user to clarify or HALT</action>
  </step>
@ -176,7 +183,7 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
    <critical>Load all available context to inform implementation</critical>

    <action>Load {project_context} for coding standards and project-wide patterns (if exists)</action>
-    <action>Parse sections: Story, Acceptance Criteria, Tasks/Subtasks, Dev Notes, Dev Agent Record, File List, Change Log, Status</action>
+    <action>Parse sections: Story, Acceptance Criteria, Exit Criteria, Rollback Boundary, Tasks/Subtasks, Dev Notes, Dev Agent Record (including Implementation Plan), File List, Change Log, Status, Senior Developer Review (AI)</action>
    <action>Load comprehensive context from story file's Dev Notes section</action>
    <action>Extract developer guidance from Dev Notes: architecture requirements, previous learnings, technical specifications</action>
    <action>Use enhanced story context to inform implementation decisions and approaches</action>
@ -188,40 +195,50 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
  <step n="3" goal="Detect review continuation and extract review context">
    <critical>Determine if this is a fresh start or continuation after code review</critical>

-    <action>Check if "Senior Developer Review (AI)" section exists in the story file</action>
-    <action>Check if "Review Follow-ups (AI)" subsection exists under Tasks/Subtasks</action>
+    <action>Check the Tasks/Subtasks section for a `### Review Findings` subsection and any unchecked review items (`[Review][Decision]` or `[Review][Patch]`)</action>
+    <action>Check for legacy review artifacts: `Senior Developer Review (AI)` section, `Review Follow-ups (AI)` subsection, or `[AI-Review]` task prefixes</action>
+    <action>Count unresolved review items across supported formats:
+      - Current format: unchecked `[Review][Decision]` and `[Review][Patch]` items in `### Review Findings`
+      - Legacy format: unchecked [ ] review follow-up items and unresolved `Senior Developer Review (AI)` action items
+    </action>
+    <action>Count deferred review items across supported formats (`[Review][Defer]` or legacy deferred items)</action>
+    <action>Store list of unresolved review items as {{pending_review_items}}</action>

-    <check if="Senior Developer Review section exists">
+    <check if="{{pending_review_items}} is not empty">
      <action>Set review_continuation = true</action>
-      <action>Extract from "Senior Developer Review (AI)" section:
-        - Review outcome (Approve/Changes Requested/Blocked)
-        - Review date
-        - Total action items with checkboxes (count checked vs unchecked)
-        - Severity breakdown (High/Med/Low counts)
-      </action>
-      <action>Count unchecked [ ] review follow-up tasks in "Review Follow-ups (AI)" subsection</action>
-      <action>Store list of unchecked review items as {{pending_review_items}}</action>

-      <output>⏯️ **Resuming Story After Code Review** ({{review_date}})
+      <output>⏯️ **Resuming Story After Code Review**

-        **Review Outcome:** {{review_outcome}}
-        **Action Items:** {{unchecked_review_count}} remaining to address
-        **Priorities:** {{high_count}} High, {{med_count}} Medium, {{low_count}} Low
+        **Outstanding Review Items:** {{unchecked_review_count}} remaining
+        **Decision Items:** {{review_decision_count}}
+        **Patch Items:** {{review_patch_count}}
+        **Deferred Items:** {{review_defer_count}}

-        **Strategy:** Will prioritize review follow-up tasks (marked [AI-Review]) before continuing with regular tasks.
+        **Strategy:** Prioritize unresolved review findings before continuing with regular tasks.
      </output>
    </check>

-    <check if="Senior Developer Review section does NOT exist">
+    <check if="{{pending_review_items}} is empty">
      <action>Set review_continuation = false</action>
      <action>Set {{pending_review_items}} = empty</action>

-      <output>🚀 **Starting Fresh Implementation**
+      <check if="{{all_tasks_complete}} == true">
+        <output>✅ **All Story Tasks Already Complete**

-        Story: {{story_key}}
-        Story Status: {{current_status}}
-        First incomplete task: {{first_task_description}}
-      </output>
+          Story: {{story_key}}
+          Story Status: {{current_status}}
+          Proceeding to final completion validation and review transition.
+        </output>
+      </check>
+
+      <check if="{{all_tasks_complete}} != true">
+        <output>🚀 **Starting Fresh Implementation**
+
+          Story: {{story_key}}
+          Story Status: {{current_status}}
+          First incomplete task: {{first_task_description}}
+        </output>
+      </check>
    </check>
  </step>

@ -258,6 +275,11 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
      <output>ℹ️ No sprint status file exists - story progress will be tracked in story file only</output>
      <action>Set {{current_sprint_status}} = "no-sprint-tracking"</action>
    </check>
+
+    <check if="{{all_tasks_complete}} == true">
+      <output>✅ All tasks are already checked; proceeding to final completion gates.</output>
+      <goto step="9">Completion</goto>
+    </check>
  </step>

  <step n="5" goal="Implement task following red-green-refactor cycle">
@ -265,6 +287,8 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:

    <action>Review the current task/subtask from the story file - this is your authoritative implementation guide</action>
    <action>Plan implementation following red-green-refactor cycle</action>
+    <action>Review the frozen Exit Criteria and Rollback Boundary before making changes</action>
+    <action if="current task is an unresolved `[Review][Decision]` item">HALT: "Review decision item requires user resolution before implementation can continue"</action>

    <!-- RED PHASE -->
    <action>Write FAILING tests first for the task/subtask functionality</action>
@ -282,6 +306,8 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
    <action>Document technical approach and decisions in Dev Agent Record → Implementation Plan</action>

    <action if="new dependencies required beyond story specifications">HALT: "Additional dependencies need user approval"</action>
+    <action if="planned implementation would violate the frozen Rollback Boundary">HALT: "Implementation would cross the rollback boundary and needs user approval"</action>
+    <action if="it becomes clear the frozen Exit Criteria cannot be satisfied within the approved story scope">HALT: "Frozen exit criteria cannot be satisfied without renegotiation"</action>
    <action if="3 consecutive implementation failures occur">HALT and request guidance</action>
    <action if="required configuration is missing">HALT: "Cannot proceed without necessary configuration files"</action>

@ -303,7 +329,7 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
    <action>Run all existing tests to ensure no regressions</action>
    <action>Run the new tests to verify implementation correctness</action>
    <action>Run linting and code quality checks if configured in project</action>
-    <action>Validate implementation meets ALL story acceptance criteria; enforce quantitative thresholds explicitly</action>
+    <action>Validate implementation meets ALL story acceptance criteria and exit criteria; enforce quantitative thresholds explicitly</action>
    <action if="regression tests fail">STOP and fix before continuing - identify breaking changes immediately</action>
    <action if="new tests fail">STOP and fix before continuing - ensure implementation correctness</action>
  </step>
@ -315,21 +341,28 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
    <action>Verify ALL tests for this task/subtask ACTUALLY EXIST and PASS 100%</action>
    <action>Confirm implementation matches EXACTLY what the task/subtask specifies - no extra features</action>
    <action>Validate that ALL acceptance criteria related to this task are satisfied</action>
+    <action>Validate that the implementation still supports the frozen Exit Criteria and has not crossed the frozen Rollback Boundary</action>
    <action>Run full test suite to ensure NO regressions introduced</action>

    <!-- REVIEW FOLLOW-UP HANDLING -->
-    <check if="task is review follow-up (has [AI-Review] prefix)">
-      <action>Extract review item details (severity, description, related AC/file)</action>
+    <check if="task is a persisted review item (has [Review][Patch] prefix or [AI-Review] prefix)">
+      <action>Extract review item details (category, description, related AC/file)</action>
      <action>Add to resolution tracking list: {{resolved_review_items}}</action>

-      <!-- Mark task in Review Follow-ups section -->
-      <action>Mark task checkbox [x] in "Tasks/Subtasks → Review Follow-ups (AI)" section</action>
+      <check if="task comes from `### Review Findings` subsection">
+        <action>Mark the matching `[Review][Patch]` item checkbox [x] in "Tasks/Subtasks → Review Findings"</action>
+      </check>

-      <!-- CRITICAL: Also mark corresponding action item in review section -->
-      <action>Find matching action item in "Senior Developer Review (AI) → Action Items" section by matching description</action>
-      <action>Mark that action item checkbox [x] as resolved</action>
+      <check if="task comes from legacy `Review Follow-ups (AI)` subsection">
+        <action>Mark task checkbox [x] in "Tasks/Subtasks → Review Follow-ups (AI)" section</action>
+      </check>

-      <action>Add to Dev Agent Record → Completion Notes: "✅ Resolved review finding [{{severity}}]: {{description}}"</action>
+      <check if="legacy `Senior Developer Review (AI) → Action Items` section exists">
+        <action>Find matching action item in "Senior Developer Review (AI) → Action Items" section by matching description</action>
+        <action>Mark that action item checkbox [x] as resolved</action>
+      </check>
+
+      <action>Add to Dev Agent Record → Completion Notes: "✅ Resolved review finding: {{description}}"</action>
    </check>

    <!-- ONLY MARK COMPLETE IF ALL VALIDATION PASS -->
@ -364,23 +397,32 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
    <action>Run the full regression suite (do not skip)</action>
    <action>Confirm File List includes every changed file</action>
    <action>Execute enhanced definition-of-done validation</action>
-    <action>Update the story Status to: "review"</action>

    <!-- Enhanced Definition of Done Validation -->
    <action>Validate definition-of-done checklist with essential requirements:
      - All tasks/subtasks marked complete with [x]
      - Implementation satisfies every Acceptance Criterion
+      - Implementation satisfies every Exit Criterion
      - Unit tests for core functionality added/updated
      - Integration tests for component interactions added when required
      - End-to-end tests for critical flows added when story demands them
      - All tests pass (no regressions, new tests successful)
      - Code quality checks pass (linting, static analysis if configured)
+      - Frozen Rollback Boundary remains valid; no unauthorized irreversible scope expansion
      - File List includes every new/modified/deleted file (relative paths)
      - Dev Agent Record contains implementation notes
      - Change Log includes summary of changes
      - Only permitted story sections were modified
    </action>

+    <!-- Final validation gates -->
+    <action if="any task is incomplete">HALT - Complete remaining tasks before marking ready for review</action>
+    <action if="regression failures exist">HALT - Fix regression issues before completing</action>
+    <action if="File List is incomplete">HALT - Update File List with all changed files</action>
+    <action if="definition-of-done validation fails">HALT - Address DoD failures before completing</action>
+
+    <action>Update the story Status to: "review"</action>
+
    <!-- Mark story ready for review - sprint status conditional -->
    <check if="{sprint_status} file exists AND {{current_sprint_status}} != 'no-sprint-tracking'">
      <action>Load the FULL file: {sprint_status}</action>
@ -402,12 +444,6 @@ Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
        Story status is set to "review" in file, but sprint-status.yaml may be out of sync.
      </output>
    </check>
-
-    <!-- Final validation gates -->
-    <action if="any task is incomplete">HALT - Complete remaining tasks before marking ready for review</action>
-    <action if="regression failures exist">HALT - Fix regression issues before completing</action>
-    <action if="File List is incomplete">HALT - Update File List with all changed files</action>
-    <action if="definition-of-done validation fails">HALT - Address DoD failures before completing</action>
  </step>

  <step n="10" goal="Completion communication and user support">
--- a/src/bmm-skills/module-help.csv
+++ b/src/bmm-skills/module-help.csv
@ -20,7 +20,8 @@ BMad Method,bmad-validate-prd,Validate PRD,VP,,,[path],2-planning,bmad-create-pr
 BMad Method,bmad-edit-prd,Edit PRD,EP,,,[path],2-planning,bmad-validate-prd,,false,planning_artifacts,updated prd
 BMad Method,bmad-create-ux-design,Create UX,CU,"Guidance through realizing the plan for your UX, strongly recommended if a UI is a primary piece of the proposed project.",,2-planning,bmad-create-prd,,false,planning_artifacts,ux design
 BMad Method,bmad-create-architecture,Create Architecture,CA,Guided workflow to document technical decisions.,,3-solutioning,,,true,planning_artifacts,architecture
-BMad Method,bmad-create-epics-and-stories,Create Epics and Stories,CE,,,3-solutioning,bmad-create-architecture,,true,planning_artifacts,epics and stories
+BMad Method,bmad-create-workflow-contract,Create Workflow Contract,CW,Define canonical workflow/operator contract for cross-repo or cross-system work: identity ownership operator evidence compatibility and migration rules.,,3-solutioning,bmad-create-architecture,,false,planning_artifacts,workflow contract
+BMad Method,bmad-create-epics-and-stories,Create Epics and Stories,CE,,,3-solutioning,bmad-create-workflow-contract,,true,planning_artifacts,epics and stories
 BMad Method,bmad-check-implementation-readiness,Check Implementation Readiness,IR,Ensure PRD UX Architecture and Epics Stories are aligned.,,3-solutioning,bmad-create-epics-and-stories,,true,planning_artifacts,readiness report
 BMad Method,bmad-sprint-planning,Sprint Planning,SP,Kicks off implementation by producing a plan the implementation agents will follow in sequence for every story.,,4-implementation,,,true,implementation_artifacts,sprint status
 BMad Method,bmad-sprint-status,Sprint Status,SS,Anytime: Summarize sprint status and route to next workflow.,,4-implementation,bmad-sprint-planning,,false,,
--- a/src/core-skills/bmad-discovery-rigor/SKILL.md
+++ b/src/core-skills/bmad-discovery-rigor/SKILL.md
@ -0,0 +1,44 @@
+---
+name: bmad-discovery-rigor
+description: "Runs structured discovery before execution. Use when the user says 'run discovery', 'classify this', 'think before acting', or 'use discovery rigor'."
+---
+
+# Discovery Rigor
+
+## Overview
+
+This skill helps you understand ambiguous, high-stakes, or convergence-heavy work before execution begins. It classifies the task, closes the most important information gaps, sweeps for blind spots, and triggers research only when unresolved unknowns justify it. Your output is a Discovery Context document that downstream skills can trust for routing, constraints, and verification.
+
+Follow the instructions in [workflow.md](workflow.md).
+
+## Core Outcomes
+
+- **Classify accurately** so the task gets the right depth of discovery and the right downstream route.
+- **Replace assumptions with evidence** by surfacing missing information before execution work begins.
+- **Sweep blind spots systematically** so entire categories of risk do not stay invisible.
+- **Leave a verified handoff** that downstream skills can use without re-running discovery from scratch.
+
+## Deliverable
+
+The output document (`{outputFile}`) should leave downstream skills with:
+
+- **Classification** — activity, tier, convergence flag, and reasoning
+- **Interview Findings** — the most important answers and remaining unknowns
+- **Blind Spots** — resolved, deferred, and still-open gaps by category
+- **Research Summary** — only when discovery escalates into research
+- **Evidence Manifest** — workspace surfaces consulted and self-served findings
+- **Contract Candidates** — when convergence work is detected
+- **Verification Strategy** — how downstream work should prove correctness
+- **Open Items** — unresolved issues with owners and next actions
+- **Constraints and Non-Goals** — explicit scope boundaries
+- **Recommendation** — the next skill or workflow to run, and the downstream handoff recorded in the State Ledger
+
+## Recovery
+
+If conversation context is compressed, re-read this file and [workflow.md](workflow.md). The `{outputFile}` frontmatter (`stepsCompleted`, `discoveryCounter`, `lastStep`) plus the State Ledger are the canonical recovery surfaces.
+
+## On Activation
+
+- Load available config and resolve `{outputFile}`.
+- Check whether discovery should start fresh or recover from an existing artifact.
+- Begin with [workflow.md](workflow.md), then route into `./steps/step-01-classify.md`.
--- a/src/core-skills/bmad-discovery-rigor/agents/SKILL.md
+++ b/src/core-skills/bmad-discovery-rigor/agents/SKILL.md
@ -0,0 +1,64 @@
+---
+name: thinking-protocol
+description: "Protocol-driven problem-solving agent. Use when the user asks to use thinking-protocol, run the protocol, or wants structured discovery before acting."
+---
+
+# Thinking-Protocol Agent
+
+This skill provides a protocol-driven workflow agent for complex, ambiguous, or high-stakes work. Act as the methodology owner: run structured discovery first, preserve a visible State Ledger, then hand off to the right downstream BMAD skill without losing verification discipline. The outcome is a traceable path from problem statement to evidence-based execution.
+
+## Protocol Anchor
+
+These rules survive context pressure and take precedence over convenience:
+
+- **Mandatory sequence:** `/CLASSIFY` → `/INTERVIEW` → `/BLIND-SPOTS` → `[conditional: /RESEARCH]` → work → `/CHECK-COMPLETE` → `/SAVE`
+- **Classify before acting** — do not answer substantial tasks without first choosing the right discovery path.
+- **Keep the State Ledger visible** — update it at each stop-gate so recovery and handoff stay explicit.
+- **Challenge unsupported premises** — when evidence conflicts with the request, surface the conflict before proceeding.
+- **Recover, don't improvise** — if context pressure causes drift, re-read this file and resume from the last reliable State Ledger state.
+
+## Execution Model
+
+- **Default path:** invoke `bmad-discovery-rigor` to perform classification, interview, blind-spot sweep, conditional research, and handoff.
+- **Fallback path:** if the skill is unavailable, reproduce the same method inline rather than skipping discovery.
+- **Handoff path:** once discovery is complete, route execution to the most relevant BMAD skill, record that choice in the State Ledger `Skill:` line, and pass the Discovery Context forward.
+- **Verification ownership:** the downstream skill may execute the work, but `/CHECK-COMPLETE` and `/SAVE` remain your responsibility.
+
+## Operating Modes
+
+### Cross-Repository Work
+
+- Prefer project-specific skills when they exist; keep this agent focused on methodology.
+- Self-serve context from the target repository using workspace evidence rather than assumptions.
+- Anchor claims to files, commands, and observed behavior whenever possible.
+
+### Autonomous Mode
+
+- Run every discovery step in sequence.
+- Self-serve gate answers from workspace evidence instead of asking the user when possible.
+- Mark self-served answers with a `🔍` prefix in the State Ledger.
+- Halt only for genuinely unresolvable inputs.
+
+### Copilot Integration
+
+- Treat `/STATUS` as a State Ledger surface and `/HELP` as a command-family summary.
+- Use structured question tools in batches of 2-3 when a stop-gate needs user input.
+- If session weight becomes a blocker, say so and continue from the State Ledger in a fresh chat.
+
+## Persistence and Support
+
+- Read `/memories/lessons.md` at session start when it exists.
+- Use `/memories/session/` for task state and `/memories/` for durable lessons after `/SAVE`.
+- Use the built-in `Explore` subagent when read-only codebase discovery is the fastest way to close a gap.
+
+## Capability
+
+| Code | Description | Skill |
+| ---- | ----------- | ----- |
+| DR | Discovery Rigor: full structured discovery workflow with verified handoff | bmad-discovery-rigor |
+
+## On Activation
+
+1. Start with `/CLASSIFY` unless resuming from a State Ledger.
+2. If resuming, recover from the last completed stop-gate before taking new actions.
+3. After discovery, choose the downstream skill that best matches the classified work.
--- a/src/core-skills/bmad-discovery-rigor/agents/bmad-skill-manifest.yaml
+++ b/src/core-skills/bmad-discovery-rigor/agents/bmad-skill-manifest.yaml
@ -0,0 +1,14 @@
+type: agent
+name: thinking-protocol
+displayName: Thinking Protocol
+title: Protocol-Driven Problem-Solving Agent
+icon: "🧠"
+capabilities: "structured discovery, problem classification, gap analysis, blind-spot detection, verification, cross-domain reasoning"
+role: Discovery & Verification Agent
+identity: "Protocol-driven problem-solving agent that applies structured discovery rigor — classify, interview, surface blind spots, and verify before delivering. Works across any domain: software engineering, policy, financial analysis, system design, or operations."
+communicationStyle: "Methodical and evidence-anchored. Follows a mandatory discovery sequence before delivering. Maintains a State Ledger for transparency and auditability."
+principles: "Never answer without classifying first. Never assume — interview for gaps. Always maintain a State Ledger. Optimize for the best defensible output, not the shortest response. Anchor claims to workspace evidence."
+module: core
+canonicalId: thinking-protocol
+webskip: false
+hasSidecar: false
--- a/src/core-skills/bmad-discovery-rigor/bmad-skill-manifest.yaml
+++ b/src/core-skills/bmad-discovery-rigor/bmad-skill-manifest.yaml
@ -0,0 +1 @@
+type: skill
--- a/src/core-skills/bmad-discovery-rigor/resources/2026-03-bmad-output-method-improvements-distillate.md
+++ b/src/core-skills/bmad-discovery-rigor/resources/2026-03-bmad-output-method-improvements-distillate.md
@ -0,0 +1,136 @@
+---
+type: bmad-distillate
+sources:
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-android-e2e-avd.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-android-e2e-ci-failure.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-android-e2e-detox-startup.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-android-e2e-disk-space.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-ci-full-audit.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-conference-solution-id-audit.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-desktop-audio-share.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-discovery-rigor-improvements.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-electron-migration.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-playwright-migration.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-playwright-shared-helpers-2026-03-24.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-qa-screenshots-republish.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-tabs-host-module-resolution.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-trigger-title-delay.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-virtual-background-ab-toggles.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-virtual-background-broader-audit.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-virtual-background-improvements.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context-virtual-background.md'
+  - '../../../../../../thinking-protocol/_bmad-output/discovery-context.md'
+  - '../../../../../../thinking-protocol/_bmad-output/planning-artifacts/discovery-context.md'
+  - '../../../../../../thinking-protocol/_bmad-output/planning-artifacts/epic-playwright-desktop-migration.md'
+  - '../../../../../../thinking-protocol/_bmad-output/planning-artifacts/technical-design-docpipe-playwright-convergence.md'
+  - '../../../../../../thinking-protocol/_bmad-output/planning-artifacts/research/technical-gas-closed-sidebar-freshness-research-2026-03-23.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/1-1-define-desktop-launcher-contract.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/1-2-create-desktop-playwright-project-and-fixture.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/1-3-consolidate-shared-helper-surface.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/1-4-port-desktop-data-and-verification-utilities.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/1-5-migrate-desktop-spec-files.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/1-6-remove-legacy-test-ownership-from-desktop-app-repo.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/deferred-work.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/sprint-status.yaml'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/tech-spec-docpipe-v1.1-editorial-layer.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/tech-spec-fix-macos-desktop-audio-screen-share-silent-success.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/tech-spec-generic-docpipe-workflow-contract.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/tech-spec-generic-docpipe.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/tech-spec-migrate-electron-tests-toda-e2e.md'
+  - '../../../../../../thinking-protocol/_bmad-output/implementation-artifacts/tech-spec-virtual-background-temporary-instrumentation.md'
+downstream_consumer: 'BMAD-METHOD repository improvement'
+created: '2026-03-24'
+token_estimate: 3887
+source_total_tokens: 88547
+compression_ratio: '22.8:1'
+parts: 1
+---
+
+## Corpus Shape
+
+- 37 artifacts: 20 discovery contexts, 4 planning artifacts, 13 implementation artifacts; span Solve/Quick through Full-Formal; Execute and Build classifications; all completed 2026-03-22 through 2026-03-24
+- Artifact lifecycle chain: Discovery Context → Epic → Technical Design → Tech Spec → Story → Dev Agent Record → Sprint Status → Deferred Work; strongest BMAD value came from layered freeze points and explicit contract sections, not any single artifact type
+- Discovery contexts consistently carried: classification, interview findings, blind spots (resolved/deferred), open items, constraints/non-goals, verification strategy, recommendation, and often a State Ledger; this is the most reusable BMAD output shape
+- Implementation artifacts consistently carried: frozen intent, acceptance criteria, task checklists, dev notes (intent, technical requirements, architecture compliance, file structure, testing, constraints/non-goals, risks to avoid, references), validation notes, and dev agent record; these sections enabled bounded autonomous execution and review-ready evidence
+- Planning/design artifacts separated: research facts, design decisions (numbered with rationale), workflow/operator contracts, data contracts (producer/consumer/ownership), implementation slices (with exit criteria), and verification strategy; BMAD should treat these as distinct artifact classes
+
+## Discovery-Rigor Classification Effectiveness
+
+- Solve/Quick (1 of 20): single-domain codebase trace; resolved via workspace evidence alone; correct classification prevented over-engineering discovery
+- Solve/Structured (14 of 20): cross-boundary issues needing formal evidence gathering; discoveryCounter 0–3; predominant tier; worked well when autonomous self-serve was available
+- Solve/Full-Formal (1 of 20): system-wide CI audit; high unknown count and consequence of partial fixes; appropriate depth
+- Execute (3 of 20): migration, test porting, framework convergence; tier=null; worked but classification lacks signal for convergence/contract-standardization work
+- Build (1 of 20): self-referential framework improvement; domain fragments identification useful here
+- Gap: no classification signal for convergence work (unifying parallel systems under shared contracts); these were classified as Execute but needed contract-first downstream handoff rather than standard sprint decomposition
+
+## Interview and Self-Serve Patterns
+
+- 🔍 self-serve marker used 10+ times across corpus; question answered from workspace evidence without user interaction; discipline consistently applied
+- Self-serve rate high: 15 of 20 contexts completed steps 1-3 autonomously; only 2 reached research step; zero returned mid-discovery asking "what should I do next"
+- Interview table format (# | Question | Answer | Status) enables structured triage; Unknown status (+1 Counter) works as designed
+- Common resolved topics: cross-repo dependencies, config surface vs implementation, platform scope, auth/credentials, file paths, naming schemes, prior changes
+- When self-serve stuck correctly: android-e2e-detox-startup Counter hit 3 before answering "does this repro locally?"; flagged Unknown without guessing
+
+## Blind-Spot Sweep Effectiveness
+
+- Seven categories recurred with real findings across corpus: operational resilience, evolution and change, partial observability, cognitive maintainability, emergent behavior, economic constraints, cross-boundary commitment
+- Most productive categories: operational resilience (missing preflight guards, recovery paths), evolution and change (version drift, implicit assumptions, breaking schema changes), partial observability (missing instrumentation, silent failures, unvalidated contracts)
+- Blind-spot resolution triage: Resolved (address downstream) → Deferred (later phase/different owner) → Unknown (increments Counter); triage worked well
+- Gap: blind spots do not probe contract surfaces (ownership boundaries, operator model, evidence provenance, compatibility rules, migration posture) when multi-system convergence is the core problem; these were surfaced ad-hoc but not guided by the standard category sweep
+- Gap: no category for cross-repo contract formalization; "team and organization" is closest but doesn't probe interface definitions, versioning, or validation across repo boundaries
+
+## Handoff and Transition Gaps
+
+- Handoff recommendation strongly bound to classification: Counter=0+Structured → bmad-quick-dev-new-preview; Counter=0+Execute → bmad-sprint-planning; Counter≥2 → bmad-technical-research
+- Non-goals/constraints repeated at handoff prevent downstream scope creep; verification strategy bundled with handoff; both patterns effective
+- Gap: discovery stops at context + recommendation; successful execution still needed manual bridging to convert findings into execution slices, ownership matrices, validation matrices, review evidence expectations
+- Gap: no post-handoff State Ledger inheritance; downstream skill creates independent progress tracking; discovery → implementation audit trail lost
+- Gap: no standardized Implementation Checklist or Verification Artifact output from discovery; verification strategy stated in words, not as testable assertion list or acceptance criteria formalism
+- Gap: no Evidence Manifest section standardizing which files, logs, commands, and self-served surfaces were consulted
+- Gap: handoff verification does not reject cross-run contamination of the canonical discovery file; one mutable file can collide with isolated runs
+
+## Artifact Structure Patterns That Worked
+
+- Frozen-intent blocks (`<frozen-after-approval>`) lock human-owned narrative before implementation; prevents agent scope renegotiation; used across all tech specs and stories
+- Boundaries & Constraints matrix (Always / Ask First / Never): clear constraint formalism clarifying fixed boundaries, human decision points, and architectural non-starters
+- I/O & Edge-Case Matrix (scenario | input/state | expected output | error handling): defines contract surface before implementation; enables exhaustive edge-case coverage
+- Story Dev Notes structure (Intent, Technical Requirements, Architecture Compliance, File Structure, Testing, Constraints/Non-Goals, Risks to Avoid, References): enables Copilot to parse intent and constraints predictably
+- Dev Agent Record (Agent Model, Debug Log References, Completion Notes, File List, Change Log): machine-readable completion audit trail for retrospective
+- Validation Notes recording how ACs were validated during implementation: audit trail proving compliance without re-running
+- Deferred-work artifact: adversarial review surfaced issues captured separately so they don't block current work but aren't forgotten
+- Sprint status YAML with explicit state definitions and transition rules: single source of truth for status queries
+- Config schema evolution: YAML + JSON Schema model with adapter pattern, optional fields with defaults, validation as separate concern
+
+## Contract and Ownership Patterns
+
+- Producer/Consumer/Ownership contract triple: who produces what, who consumes what, who owns each concern; prevents ambiguous cross-repo failures
+- Launcher contract pattern: typed contract covering repoPath, startCommand, readiness signal, envOverrides; preflight validation fails clearly when prerequisites missing
+- Manifest extension pattern: additive optional fields (sourceId, sourceHash, generatedFrom); collision rule (fail hard on identity conflict, not silent overwrite)
+- Proof vs publish isolation in single manifest: safe proof runs that don't affect live publish state
+- Compliance checklist pattern: explicit verifiable questions (where do docs live, which command proves, which publishes, where is evidence written)
+- Cross-platform configuration: macOS/Windows command differences resolved through config rather than hard-coded test logic
+
+## Missing BMAD Skills Identified
+
+- Primary: bmad-create-workflow-contract — turn discovery/design outputs into canonical workflow/operator contract for cross-repo or cross-system work; define config paths, command surface, proof vs publish, evidence locations, ownership matrix, identity rules, compatibility strategy, migration posture; the corpus repeatedly solved this manually
+- Cross-repo contract formalization — no skill for defining, validating, or versioning cross-repo interfaces; needed by all migration and convergence work
+- Config schema & migration — no skill for designing forward-compatible frontmatter or config schemas with migration paths; docpipe and discovery-rigor both need this
+- Compliance-checklist validation — docpipe workflow contract has explicit checklist but no "validate this repo is compliant" skill
+- Deferred-item capture workflow — deferred-work.md is hand-written; no formal skill to standardize when to defer vs implement now
+- Frozen-intent directive — tech specs use frozen-after-approval manually; no formal skill to mark intent immutable and prevent renegotiation
+- Instrumentation & benchmarking — no skill for "add metrics → run matrix → interpret results" temporary measurement harness
+- Platform variance validation — no skill for designing cross-platform verification matrices or platform-specific regression detection
+- Implementation slice planning — no formal skill for decomposing features into minimal viable slices with exit criteria and rollback boundaries
+- Design verification operationalization — no skill to convert "design is valid only if [N conditions] pass" into test fixture inventory and condition matrix
+
+## Discovery-Rigor Specific Improvements Needed
+
+- Classification should recognize convergence/contract-standardization as Build/Execute signal requiring contract-first downstream handoff
+- Blind-spot sweep should probe: stable identity, ownership boundaries, canonical intermediate artifacts, operator surface, evidence contracts, compatibility rules, migration posture when multi-system convergence detected
+- Handoff should emit Evidence Manifest (files, logs, commands, self-served surfaces consulted) and Contract Candidates (identity, ownership, operator, evidence, compatibility) when applicable
+- Research routing should allow evidence-depth override; comparative/convergence work sometimes needs research even when Counter stays below threshold
+- Handoff verification should reject cross-run contamination of canonical discovery file; should enforce missing evidence provenance; should reject non-actionable open items
+- Canonical artifact identity is weak; runId, artifactRole, canonicalAlias should be first-class frontmatter metadata to prevent collision between isolated discovery runs
+- Artifact schema discipline is inconsistent; frontmatter field names vary across tech specs (status vs no status, different field sets); opportunity for validation
+- Session state not persisted across discovery invocations; if long conversations cause context pressure and restart, prior State Ledger not re-loaded; recovery is manual
+- Party-mode exploration (discovery-rigor-improvements context) identified P0=Recovery Check protocol, P1=preamble extraction, P2=session memory layer, P3=two-tier memory documentation, P4=token measurement; P0 and P3 now implemented, P1/P2/P4 still open
--- a/src/core-skills/bmad-discovery-rigor/resources/classification-guide.csv
+++ b/src/core-skills/bmad-discovery-rigor/resources/classification-guide.csv
@ -0,0 +1,6 @@
+activity,tier,description,indicators,convergence_signal
+Solve,Quick,"Clear single-domain problem with known or easily discoverable answer","Direct question; single technology or domain; no stated unknowns; user confident in scope",false
+Solve,Structured,"Problem with some unknowns that needs formal reasoning","Cross-domain elements; multiple valid approaches; 1-2 unknowns; user unsure about scope or constraints",false
+Solve,Full-Formal,"High-stakes or cross-domain problem with many unknowns","Regulatory or security-critical; novel domain; 3+ unknowns; significant consequences of getting it wrong",false
+Build,,"Design a reusable framework, template, tool, system, or contract surface","Creating something new; template or tool design; API or platform design; system architecture; workflow or operator contract standardization; cross-repo convergence",when_indicators_match
+Execute,,"Run a cycle within an existing framework or process","Sprint execution; deployment; migration; code review; running an established workflow; implementing an approved cross-system migration or convergence slice",when_indicators_match
--- a/src/core-skills/bmad-discovery-rigor/resources/discovery-resources-index.csv
+++ b/src/core-skills/bmad-discovery-rigor/resources/discovery-resources-index.csv
@ -0,0 +1,7 @@
+fragment,tier,domain,description
+classification-guide.csv,core,all,"Activity types, tiers, and classification indicators"
+system-reality-categories.csv,core,all,"13 categories for blind-spot sweep with probing questions"
+structured-reasoning.md,domain,structured+,"Formal reasoning scaffold: axioms, state model, constraints, risks, verification"
+software-engineering.md,domain,software,"Software-specific probes, patterns, verification approaches"
+llm-systems.md,domain,llm-agent,"LLM/agent-specific probes, contracts, probabilistic reasoning"
+formal-readiness.md,domain,full-formal,"Formal readiness probe questions for Full-Formal tier classifications"
--- a/src/core-skills/bmad-discovery-rigor/resources/formal-readiness.md
+++ b/src/core-skills/bmad-discovery-rigor/resources/formal-readiness.md
@ -0,0 +1,13 @@
+# Formal Readiness Probe
+
+Loaded for: Full-Formal tier classifications
+
+## Questions to Integrate into Interview (Step 2)
+
+When the classification is Full-Formal, add these probes to the interview batch:
+
+1. **Provability boundaries:** Which subsystems MUST be provably correct vs. best-effort?
+2. **Safety invariants:** What conditions must NEVER be violated, regardless of input?
+3. **Regulatory proofs:** Are there compliance requirements that demand formal evidence?
+4. **Failure probability:** What's the acceptable failure rate for critical paths?
+5. **Dynamic behavior:** Does the system self-correct? What feedback loops exist?
--- a/src/core-skills/bmad-discovery-rigor/resources/llm-systems.md
+++ b/src/core-skills/bmad-discovery-rigor/resources/llm-systems.md
@ -0,0 +1,97 @@
+# LLM Systems Domain Knowledge
+
+_Loaded when classification indicates a problem involving LLMs or AI agents as deployed components. Provides domain-specific probes, contract patterns, and probabilistic reasoning frameworks for the discovery workflow._
+
+## When to Use
+
+- System contains an LLM or AI agent as a component (not just using an LLM to help solve the problem)
+- Problem involves agent behavioral contracts, tool governance, or autonomous decision-making
+- Classification is Solve or Build in an LLM/agent context
+
+## Domain-Specific Interview Probes
+
+Use these to deepen Step 2 (Interview) questioning for LLM-based systems:
+
+### Agent Architecture
+
+- What is the agent's role? What actions can it take autonomously?
+- What tools and APIs does the agent have access to? What are the governance limits?
+- Is this a single agent or a multi-agent system? How do agents coordinate?
+- What is the verification paradigm? (Pure formal, pure LLM, hybrid neuro-symbolic)
+
+### Behavioral Contracts
+
+- What preconditions must hold before the agent executes? (Context window state, required inputs)
+- What invariants must hold across the entire session? (Never reveal system prompt, output language matches input, etc.)
+- What governance boundaries exist? (Max tool calls per turn, filesystem access restrictions, etc.)
+- What recovery procedures handle soft-constraint violations without terminating the session?
+
+### Probabilistic Reasoning
+
+Help the user reason about acceptable failure rates:
+
+- "Out of 100 agent actions, how many failures would you tolerate before considering the system broken?"
+- What is the acceptable violation tolerance (epsilon)? Use domain defaults if user cannot specify:
+
+| Domain                   | Epsilon | Confidence (delta) | Evaluation Window (n) |
+| ------------------------ | ------- | ------------------ | --------------------- |
+| Chat agents / assistants | 0.05    | 0.01               | 100                   |
+| Code generation          | 0.01    | 0.001              | 500                   |
+| Safety-critical systems  | 0.001   | 0.0001             | 1000                  |
+
+### Context and Session Management
+
+- What is the maximum session length? How does performance degrade as context fills?
+- Are there context summarization checkpoints?
+- How are upstream model provider changes detected? Is there behavioral fingerprinting?
+
+## Domain-Specific Blind-Spot Probes
+
+Additional probes for Step 3 (Blind Spots) when the system involves LLMs:
+
+| System Reality Category   | LLM-Specific Probe                                                                  |
+| ------------------------- | ----------------------------------------------------------------------------------- |
+| Operational resilience    | What happens when the LLM provider has an outage? Is there a fallback model?        |
+| Observability             | How do you detect behavioral drift vs. normal stochastic variation?                 |
+| Economic constraints      | What is the token cost per session? At projected scale, is this sustainable?        |
+| Evolution and change      | How do upstream model updates affect behavior? Is there regression testing?         |
+| Team and organization     | Who monitors agent behavior post-deployment? Who investigates anomalies?            |
+| Emergent behavior         | What happens when chained tool calls produce unexpected compound effects?           |
+| Adversarial environment   | What is the prompt injection threat model? Can tool outputs be weaponized?          |
+| Cognitive maintainability | Can a new team member understand the agent's behavioral contracts?                  |
+| Partial observability     | What can't you see about the agent's internal reasoning? What decisions are opaque? |
+| Cross-boundary contracts  | Are skill/agent/tool interface contracts explicit? What happens when a skill is updated but its consumers are not? |
+
+## LLM-Specific Risk Patterns
+
+### Standard Interrogation Patterns
+
+Use these during Step 2 or Step 3 to probe for LLM-domain failure modes:
+
+1. **LLM-as-a-Judge risk:** If the system uses an LLM to evaluate another LLM's output, the oversight is inherently subjective. Consider deterministic verification instead.
+
+2. **Context window degradation:** Agent performance degrades as context fills, causing early instructions to be effectively forgotten. Define maximum session length in governance.
+
+3. **Upstream model regression:** Silent model changes by the provider alter agent behavior without code changes. Implement behavioral fingerprinting with pinned model snapshots.
+
+4. **Hallucination in generated artifacts:** LLM output may be syntactically valid but semantically wrong. Combine static analysis with dynamic testing against formal properties.
+
+5. **Supply chain risk:** External tools and APIs invoked by agents can be vectors for data exfiltration or prompt injection via tool output.
+
+## Agent Contract Template (ABC Framework)
+
+When the problem involves specifying agent behavior, capture these four components:
+
+| Component         | Definition                                                              |
+| ----------------- | ----------------------------------------------------------------------- |
+| **Preconditions** | Required state before agent executes                                    |
+| **Invariants**    | Properties that must hold across the entire session                     |
+| **Governance**    | Action-space constraints, tool limits, hard security boundaries         |
+| **Recovery**      | Procedures for handling soft violations without terminating the session |
+
+## Integration with Discovery Steps
+
+- **Step 2 (Interview):** Use agent architecture and behavioral contract probes
+- **Step 3 (Blind Spots):** Use LLM-specific probes alongside System Reality Categories; apply standard interrogation patterns
+- **Step 4 (Research):** Use probabilistic reasoning framework to structure research on acceptable bounds
+- **Step 5 (Handoff):** Include contract template, risk patterns, and epsilon/delta bounds in Discovery Context
--- a/src/core-skills/bmad-discovery-rigor/resources/software-engineering.md
+++ b/src/core-skills/bmad-discovery-rigor/resources/software-engineering.md
@ -0,0 +1,86 @@
+# Software Engineering Domain Knowledge
+
+_Loaded when classification indicates a software engineering problem. Provides domain-specific probes, verification patterns, and design considerations for the discovery workflow._
+
+## When to Use
+
+- Problem involves software design, architecture, implementation, or debugging
+- Classification is Solve (any tier) or Build in a software context
+- System under discussion is conventional software (no LLM in the deployed system)
+
+## Domain-Specific Interview Probes
+
+Use these to deepen Step 2 (Interview) questioning when the problem is software-related:
+
+### Architecture and Design
+
+- What is the current architecture? (Monolith, microservices, serverless, etc.)
+- What patterns are in use? What was rejected and why?
+- What are the system boundaries? Where does this code interact with external systems?
+- Is there a formal specification or does the system rely on implicit contracts?
+
+### State and Data Integrity
+
+- What is the source of truth for each data entity?
+- What invariants must the data maintain? ("What would break if this changed?")
+- Are there race conditions, concurrent access, or distributed state issues?
+- What consistency model is required? (Strong, eventual, causal)
+
+### Type Safety and Contracts
+
+- Does the codebase use escape hatches that bypass the type system? (e.g., `any`, `as` in TypeScript)
+- Are function contracts (preconditions, postconditions) explicit or implicit?
+- Where does the code trust external input without validation?
+
+### Testing and Verification
+
+- What is the testing strategy? (Unit, integration, e2e, property-based)
+- Which critical paths lack test coverage?
+- Is there a traceability matrix from requirements to test cases?
+
+## Domain-Specific Blind-Spot Probes
+
+Additional probes for Step 3 (Blind Spots) when category intersects with software:
+
+| System Reality Category   | Software-Specific Probe                                                |
+| ------------------------- | ---------------------------------------------------------------------- |
+| Operational resilience    | What happens at 10x/100x traffic? Is there a circuit breaker?          |
+| Observability             | Are distributed traces in place? Can you reconstruct a failed request? |
+| Economic constraints      | What is the cloud cost at projected scale? Is there a cost ceiling?    |
+| Evolution and change      | How do database migrations work? What's the deprecation strategy?      |
+| Team and organization     | Who owns each service? Can they deploy independently?                  |
+| Emergent behavior         | What cascading failure modes exist across service boundaries?          |
+| Adversarial environment   | Is input sanitized at every boundary? Is the threat model documented?  |
+| Cognitive maintainability | Could a new team member debug a production issue in this code?         |
+| Partial observability     | What happens in the gap between log emission and dashboard rendering?  |
+| Cross-boundary contracts  | Are producer/consumer/ownership contracts explicit or implicit? Is the interface versioned? How do you detect drift between repos? |
+
+## Design Pattern Decision Framework
+
+When the problem involves architectural or design decisions:
+
+1. **State the selected pattern** and its category (creational, structural, behavioral, architectural)
+2. **Justify the selection** — what problem does it solve?
+3. **Name alternatives rejected** — what was considered and why not?
+4. **Map to invariants** — how does the pattern preserve system invariants?
+
+## Verification Approaches for Software
+
+| Problem Shape                           | Primary Method           | Tool Targets                  |
+| --------------------------------------- | ------------------------ | ----------------------------- |
+| State integrity, guarded transitions    | Event-B / state machines | ProB, Atelier B               |
+| Concurrency, message ordering, deadlock | CSP                      | FDR4, PAT                     |
+| Temporal safety or liveness             | LTL                      | SPIN, NuSMV                   |
+| Distributed protocols, state-space      | TLA+                     | TLC, Apalache                 |
+| Bounded structural consistency          | Alloy                    | Alloy Analyzer                |
+| Program contracts, proof-carrying code  | Dafny or Lean            | Dafny, Z3, Lean 4             |
+| Property-based testing                  | PBT frameworks           | fast-check, Hypothesis, jqwik |
+
+Choose the lightest method that still matches the real failure mode.
+
+## Integration with Discovery Steps
+
+- **Step 2 (Interview):** Use architecture and design probes to formulate questions
+- **Step 3 (Blind Spots):** Use domain-specific blind-spot probes alongside System Reality Categories
+- **Step 4 (Research):** Use verification approaches to structure technical research
+- **Step 5 (Handoff):** Include architecture decisions, patterns, and verification strategy in Discovery Context
--- a/src/core-skills/bmad-discovery-rigor/resources/structured-reasoning.md
+++ b/src/core-skills/bmad-discovery-rigor/resources/structured-reasoning.md
@ -0,0 +1,74 @@
+# Structured Reasoning Framework
+
+_Loaded when classification is Solve/Structured or higher. Provides a formal reasoning scaffold for problems with unknowns, multiple valid approaches, or cross-domain constraints._
+
+## When to Use
+
+- Tier is **Structured** or **Full Formal**
+- User cannot immediately state constraints or invariants
+- Multiple valid approaches exist and must be reasoned through
+
+## Reasoning Sequence
+
+Use this sequence to deepen analysis during interview (step 2) and blind-spot sweep (step 3). Each element feeds the next.
+
+### 1. Axioms
+
+Fundamental truths about the problem domain. Help the user identify these — do not assume they already know them. If the user struggles, offer candidates and ask: "Does this seem true in your domain?"
+
+- **What is always true in this domain?**
+- **What would an expert take as given?**
+
+Check for pairwise contradictions. If found, present the contradiction and resolve before proceeding.
+
+### 2. State Model
+
+What exists, what changes, what must remain true.
+
+- **Entities:** What objects, actors, or concepts exist?
+- **Variables:** What properties change?
+- **Invariants:** What must ALWAYS hold? ("What would break if it changed? That's your invariant.")
+- **Operations:** What actions change the state? What are their preconditions and postconditions?
+
+### 3. Behavioral Constraints
+
+- **Safety (always):** Conditions that must hold at every point — "It is never the case that [bad thing]"
+- **Liveness (eventually):** Conditions that must eventually be reached — "If [trigger], then eventually [response]"
+- **Temporal order:** Things that must happen in sequence — "[A] before [B]"
+
+### 4. Risk Categories
+
+When populating risks, use these categories beyond logical correctness:
+
+| Category        | What to look for                                                     |
+| --------------- | -------------------------------------------------------------------- |
+| Logical         | Incorrect reasoning, violated invariants, flawed assumptions         |
+| Operational     | Failure modes, monitoring gaps, rollback limitations, capacity       |
+| Economic        | Cost overruns, over-engineering, insufficient investment             |
+| Socio-technical | Team capability gaps, ownership problems, Conway's Law               |
+| Evolution       | Brittleness to change, migration risk, dependency risk               |
+| Adversarial     | Security threats, abuse scenarios, supply chain risks                |
+| Cognitive       | Complexity exceeding team capacity, opaque design                    |
+| Emergent        | Cascading failures, feedback loops, unintended system-level behavior |
+
+### 5. Verification Strategy
+
+How to prove the solution is correct:
+
+| Approach                              | When to use                                        | What it establishes                   |
+| ------------------------------------- | -------------------------------------------------- | ------------------------------------- |
+| Formal proof                          | Properties can be stated precisely, system bounded | Correctness within the model          |
+| Statistical / property-based testing  | Large state space or stochastic behavior           | Confidence bounds, not guarantees     |
+| Simulation / prototyping              | Real-world behavior hard to predict from spec      | Empirical evidence of behavior        |
+| Monitoring and alerting               | Production may diverge from test behavior          | Ongoing operational correctness       |
+| Chaos / resilience testing            | System must tolerate component failures            | Resilience under real conditions      |
+| Threat modeling                       | System faces adversarial input                     | Security posture                      |
+| Code review and cognitive walkthrough | System maintained by humans over time              | Maintainability and comprehensibility |
+| A/B testing and canary deployment     | Behavioral impact hard to predict statically       | Empirical validation at scale         |
+
+## Integration with Discovery Steps
+
+- **Step 2 (Interview):** Use axioms, state model, and constraints to formulate deeper questions
+- **Step 3 (Blind Spots):** Use risk categories to enhance System Reality Category sweep
+- **Step 4 (Research):** Use verification strategy to structure research goals
+- **Step 5 (Handoff):** Include axioms, invariants, and constraints in Discovery Context
--- a/src/core-skills/bmad-discovery-rigor/resources/system-reality-categories.csv
+++ b/src/core-skills/bmad-discovery-rigor/resources/system-reality-categories.csv
@ -0,0 +1,14 @@
+category,what_to_probe,example_blind_spot,applies_when
+operational-resilience,"Failure modes, redundancy, rollback, circuit breakers, graceful degradation","What happens when the external API is down for 2 hours?",always
+observability,"Logging, metrics, tracing, alerting, debugging in production","How will you diagnose a slow request in production?",software
+economic-constraints,"Cost vs correctness trade-offs, infrastructure cost, development time, ROI","Is the formally correct solution 10x more expensive to build?",always
+evolution-and-change,"Requirements drift, dependency changes, migration, versioning, backward compatibility","What happens when the upstream API changes its schema?",always
+team-and-organization,"Ownership boundaries, Conway's Law effects, communication paths, bus factor","Who owns this after you ship it? Can they maintain it?",always
+emergent-behavior,"Cascading failures, feedback loops, unintended interactions at scale","What happens when 10000 users trigger this concurrently?",software
+adversarial-environment,"Threat modeling, abuse patterns, security layers, injection attacks, malicious input","What if someone deliberately sends malformed input?",software
+cognitive-maintainability,"Readability, mental models, onboarding complexity, documentation clarity","Will a new engineer understand this in 6 months?",always
+partial-observability,"Incomplete logs, sampled metrics, unpredictable user behavior, inference under uncertainty","Can you distinguish a bug from expected behavior with available telemetry?",software
+cross-boundary-contracts,"Identity rules, ownership boundaries, operator surface, evidence contract, compatibility rules, migration posture across systems or repos","Is the contract between producer and consumer explicit or tribal knowledge?",convergence
+formal-correctness,"Provability boundaries, safety invariants, regulatory proof requirements, which components need mathematical guarantees vs. best-effort","Is the payment state machine formally verified, or does it just pass tests?",software
+uncertainty-probabilistic,"Non-deterministic components, acceptable failure rates, probabilistic guarantees, ML model confidence thresholds, stochastic user behavior","What happens when the recommendation engine returns low-confidence results?",always
+control-feedback-loops,"Self-correcting mechanisms, automatic scaling, circuit breakers, drift detection, stabilization after perturbation","If the auto-scaling overshoots by 3x, how does the system stabilize?",software
--- a/src/core-skills/bmad-discovery-rigor/steps/step-01-classify.md
+++ b/src/core-skills/bmad-discovery-rigor/steps/step-01-classify.md
@ -0,0 +1,81 @@
+# Step 1: Classify
+
+Decide what kind of work this is, how much discovery rigor it needs, and whether the task is convergence work. Accurate classification sets the route, the domain fragments, and the handoff expectations for everything that follows.
+
+## Recovery
+
+Run recovery per workflow.md §RECOVERY PROTOCOL.
+
+## Classify and Confirm
+
+Load `../resources/classification-guide.csv` and classify the request across these dimensions:
+
+| Dimension | Options | How to decide |
+|-----------|---------|---------------|
+| Activity | Solve / Build / Execute | CSV `indicators` column |
+| Tier (Solve only) | Quick / Structured / Full-Formal | CSV `indicators` column |
+| Convergence | Yes / No | Unifying parallel systems, defining contracts, or standardizing boundaries usually means Yes |
+
+Present the result in a compact confirmation block:
+
+```markdown
+**Classification:**
+- Activity: **[type]**
+- Tier: **[tier]** _(Solve only)_
+- Convergence: **[Y/N]**
+- Reasoning: [2-3 sentences]
+```
+
+If Convergence = Yes, note that downstream handoff should seed contract candidates.
+
+If Convergence = Yes, note that downstream handoff should strongly consider `bmad-create-workflow-contract` even when the activity remains Solve.
+
+**🛑 HALT — Use `vscode_askQuestions` to confirm whether this classification matches the problem. In autonomous mode, self-serve from workspace evidence and log the decision.**
+
+| Response | Action |
+|----------|--------|
+| Agrees | Proceed |
+| Disagrees | Reclassify with the user's feedback |
+| Uncertain | Clarify the reasoning, then confirm again |
+
+## Prepare State
+
+Identify applicable domain fragments from `../resources/discovery-resources-index.csv` per workflow.md §DOMAIN FRAGMENT LOADING and record them in State Ledger `Decisions:`.
+
+Create `{outputFile}`:
+
+```yaml
+---
+stepsCompleted: [1]
+activity: '[activity]'
+tier: '[tier]'
+convergence: [true/false]
+discoveryCounter: 0
+lastStep: 'step-01-classify'
+---
+```
+
+Append:
+
+```markdown
+## Classification
+
+- **Activity:** [activity]
+- **Tier:** [tier]
+- **Convergence:** [Y/N]
+- **Reasoning:** [reasoning]
+- **Confirmed:** [date]
+```
+
+Initialize and output the State Ledger using workflow.md §STATE LEDGER. Record any files or workspace surfaces consulted so far in `Evidence:` and leave `Skill:` blank until handoff.
+
+## Memory Checkpoint
+
+Per workflow.md §MEMORY CHECKPOINT.
+
+## Next
+
+| Classification | Route to |
+|---|---|
+| Solve / Quick | `./step-01b-quick-handoff.md` |
+| All others | `./step-02-interview.md` |
--- a/src/core-skills/bmad-discovery-rigor/steps/step-01b-quick-handoff.md
+++ b/src/core-skills/bmad-discovery-rigor/steps/step-01b-quick-handoff.md
@ -0,0 +1,117 @@
+# Step 1b: Quick Handoff
+
+Use this branch only for Solve/Quick work that still looks simple after a short sanity check. The goal is to avoid over-processing straightforward requests while still guarding against hidden complexity.
+
+## Recovery
+
+Run recovery per workflow.md §RECOVERY PROTOCOL.
+
+## Sanity Check
+
+Ask at most 1-2 questions that could escalate the task out of Quick tier:
+
+- "Is there anything about this that's more complex than it first appears?"
+- "Are there constraints or dependencies I should know about?"
+
+**🛑 HALT — Use `vscode_askQuestions` for this gate. In autonomous mode, self-serve from workspace evidence and log the result.**
+
+| Response | Action |
+|----------|--------|
+| Confirms simple | Proceed to the gut check |
+| Reveals complexity | Reclassify to Structured or Full-Formal, update `{outputFile}` frontmatter, the Classification section, and the State Ledger `Class:` entry, then route to `./step-02-interview.md` |
+
+## Gut-Check Reality
+
+Load `../resources/system-reality-categories.csv` and quickly assess the three most relevant categories for this problem. This is a fast sanity sweep, not the full blind-spot pass.
+
+If the gut check surfaces a real gap or dependency, reclassify to Structured, update `{outputFile}` frontmatter, the Classification section, and the State Ledger `Class:` entry, then route to `./step-02-interview.md`.
+
+## Compile Thin Discovery Context
+
+Create `{outputFile}`:
+
+```yaml
+---
+stepsCompleted: [1, '1b']
+activity: 'Solve'
+tier: 'Quick'
+convergence: [true/false]
+discoveryCounter: 0
+lastStep: 'step-01b-quick-handoff'
+status: 'complete'
+---
+```
+
+Replace the body with:
+
+```markdown
+# Discovery Context
+
+## Classification
+
+- **Activity:** Solve
+- **Tier:** Quick
+- **Convergence:** [Y/N]
+- **Confirmed:** [date]
+
+## Interview Findings
+
+- Quick sanity-check findings: [one-line summary of what the user needs]
+
+## Blind Spots
+
+- Quick gut-check coverage: [the three categories checked] — [none required escalation, or note the issue that forced escalation]
+
+## Research Summary
+
+Not required on the quick path.
+
+## Evidence Manifest
+
+- Workspace files consulted: [list or `none`]
+- Commands or logs consulted: [list or `none`]
+- Self-served surfaces: [list or `none`]
+- External research: none
+
+## Contract Candidates
+
+[If Convergence = Yes: thin seeds for identity, ownership, operator, evidence, compatibility, or migration.
+Otherwise: `Not a contract-standardization task.`]
+
+## Verification Strategy
+
+[How downstream work should verify the quick-path recommendation.]
+
+## Open Items
+
+[None, or a concise table of remaining unknowns with owner and next action.]
+
+## Constraints and Non-Goals
+
+[Any constraints or non-goals surfaced during the sanity check.]
+
+## Recommendation
+
+### Discovery Narrative
+
+[One-line summary of why the work remained in Quick tier.]
+
+### Handoff Brief
+
+- [Most important finding]
+- [Key constraint or risk]
+
+### Next Skill or Workflow
+
+[Recommended next step — typically `Proceed directly` or a specific skill such as `bmad-create-workflow-contract` when Convergence = Yes]
+```
+
+Update and output the final State Ledger, including `Evidence:` and `Skill:`.
+
+## Memory Checkpoint
+
+Per workflow.md §MEMORY CHECKPOINT.
+
+## Next
+
+Discovery complete. Proceed directly or hand off to the recommended skill.
--- a/src/core-skills/bmad-discovery-rigor/steps/step-02-interview.md
+++ b/src/core-skills/bmad-discovery-rigor/steps/step-02-interview.md
@ -0,0 +1,70 @@
+# Step 2: Interview
+
+Close the highest-risk information gaps before blind-spot analysis. Ask only the questions that materially improve the next decision, and use small batches so the user can answer precisely. If any single unknown directly threatens safety, security, or data integrity, treat it as a research override regardless of Counter.
+
+## Recovery
+
+Run recovery per workflow.md §RECOVERY PROTOCOL.
+
+## Load Interview Context
+
+Load any domain fragments named in State Ledger `Decisions:`.
+
+If tier is Full-Formal, also load `../resources/formal-readiness.md` and fold only the relevant probes into the question set. They count toward the total question budget rather than creating a second interview.
+
+## Ask and Process Batches
+
+Formulate 3-6 questions targeting:
+
+- Missing information needed to proceed confidently
+- Assumptions you would otherwise make silently
+- Unstated constraints, scope boundaries, and non-goals
+- Prior work, existing context, or related artifacts
+- If Activity = Build: alternative approaches the user may not have considered
+
+Present 2-3 questions at a time.
+
+**🛑 HALT — Use `vscode_askQuestions` in batches of 2-3. In autonomous mode, self-serve from workspace evidence and log the answers.**
+
+Process each response with this table:
+
+| Response type | Action |
+|--------------|--------|
+| Clear answer | Record the finding |
+| "I don't know" | Increment Counter. Log: `Counter: [N-1] → [N] (Interview: [gap])` |
+| Contradicts earlier info | Challenge the contradiction and wait for resolution |
+| Ambiguous | Ask a follow-up; do not interpret silently |
+
+**Autonomous mode:** self-serve from workspace evidence with a `🔍` prefix. Only increment Counter for genuinely unresolvable gaps.
+
+If more questions remain, present the next batch and repeat the same halt-and-process loop.
+
+## Final Check and State Update
+
+Ask: **🛑 HALT — Use `vscode_askQuestions` to ask whether there are other details worth capturing before blind-spot analysis. In autonomous mode, self-serve from workspace evidence and log the result.**
+
+Update `{outputFile}` frontmatter:
+
+```yaml
+stepsCompleted: [1, 2]
+discoveryCounter: [N]
+lastStep: 'step-02-interview'
+```
+
+Append:
+
+```markdown
+## Interview Findings
+
+| # | Question | Answer | Status |
+|---|----------|--------|--------|
+| 1 | [question] | [answer] | ✅ Resolved / ❓ Unknown |
+
+**Discovery Counter:** [N]
+```
+
+Update and output the State Ledger. Record any files, commands, logs, and self-served surfaces consulted during the interview in `Evidence:`.
+
+## Memory Checkpoint
+
+Per workflow.md §MEMORY CHECKPOINT.
--- a/src/core-skills/bmad-discovery-rigor/steps/step-03-blind-spots.md
+++ b/src/core-skills/bmad-discovery-rigor/steps/step-03-blind-spots.md
@ -0,0 +1,100 @@
+# Step 3: Blind-Spot Sweep
+
+Sweep the problem across the applicable System Reality Categories so hidden risks, missing constraints, and unspoken failure modes become explicit before handoff. If any single unknown directly threatens safety, security, or data integrity, route to research regardless of Counter.
+
+## Recovery
+
+Run recovery per workflow.md §RECOVERY PROTOCOL.
+
+## Load Sweep Context
+
+Load `../resources/system-reality-categories.csv` and any domain fragments already identified in the State Ledger.
+
+Filter categories using this table:
+
+| `applies_when` value | Include? |
+|---------------------|----------|
+| always | Yes — always probe |
+| software | Only if software domain |
+| convergence | Only if Convergence = Yes |
+
+When in doubt, include the category; false positives are safer than missed blind spots.
+
+## Sweep and Triage
+
+For each applicable category, assess:
+
+- Does the problem statement already address this?
+- Did the interview surface information here?
+- Is there an unaddressed gap a domain expert would flag?
+
+Each blind spot should state **what is missing**, **why it matters**, and **which category** it belongs to.
+
+Present blind spots in batches of up to 5:
+
+```markdown
+**Blind spots identified:**
+1. **[Category]:** [Gap] — [Impact]
+
+For each: intentional, should address, or can't assess?
+```
+
+**🛑 HALT — Use `vscode_askQuestions` to collect responses for each blind spot. In autonomous mode, self-serve from workspace evidence and log the resolutions.**
+
+Process responses with this table:
+
+| Response | Action |
+|----------|--------|
+| Intentional / Deferred | Mark deferred with rationale |
+| Address it | Add to Open Items with type, owner, and next action for downstream |
+| Can't assess / Don't know | Increment Counter. Log: `Counter: [N-1] → [N] (Blind-spot: [category])` |
+| Contradicts earlier finding | Challenge the contradiction and resolve it before continuing |
+
+Use these deferral criteria:
+
+| Defer when | Do NOT defer when |
+|------------|-------------------|
+| Out of scope for the current problem | Directly affects the problem being solved |
+| Requires a separate discovery run | Security or safety concern |
+| Would expand scope beyond stated constraints | Would make downstream work unreliable |
+
+**Autonomous mode:** self-assess from workspace evidence with a `🔍` prefix. Only increment Counter for genuinely unresolvable gaps.
+
+If all findings cluster in one category, explicitly re-sweep the remaining categories before proceeding.
+
+## Route and Update State
+
+Choose the next step with this routing table:
+
+| Condition | Route to |
+|-----------|----------|
+| Counter ≥ 2 | `step-04-research` and set `researchReason: 'counter'` |
+| Counter < 2, Convergence = Yes, and contract candidates still need workspace evidence | `step-04-research` and set `researchReason: 'evidence-depth-override'` |
+| Counter < 2 | `step-05-handoff` and set `researchReason: 'none'` |
+
+Update `{outputFile}` frontmatter:
+
+```yaml
+stepsCompleted: [1, 2, 3]
+discoveryCounter: [N]
+researchReason: '[counter|evidence-depth-override|none]'
+lastStep: 'step-03-blind-spots'
+```
+
+Append:
+
+```markdown
+## Blind Spots
+
+| # | Category | Gap | Impact | Resolution |
+|---|----------|-----|--------|------------|
+| 1 | [category] | [gap] | [impact] | ✅ Address / ⏸️ Deferred / ❓ Unknown |
+
+**Discovery Counter after sweep:** [N]
+```
+
+Update and output the State Ledger. Record any files, commands, logs, and self-served surfaces consulted during the sweep in `Evidence:`.
+
+## Memory Checkpoint
+
+Per workflow.md §MEMORY CHECKPOINT.
--- a/src/core-skills/bmad-discovery-rigor/steps/step-04-research.md
+++ b/src/core-skills/bmad-discovery-rigor/steps/step-04-research.md
@ -0,0 +1,86 @@
+# Step 4: Research (Conditional)
+
+Enter this step only when unresolved unknowns justify deeper investigation or convergence work needs more evidence. Research should stay targeted: resolve the gaps blocking a reliable handoff rather than broadening scope.
+
+## Recovery
+
+Run recovery per workflow.md §RECOVERY PROTOCOL. Confirm `discoveryCounter ≥ 2` or that `researchReason: 'evidence-depth-override'` from step 3 justified research.
+
+## Build the Research Plan
+
+List every gap that incremented the counter, or when `researchReason: 'evidence-depth-override'`, list the contract-candidate or blind-spot evidence gaps that triggered the override even if they did not increment the counter:
+
+```markdown
+**Unresolved gaps requiring research:**
+1. [Source: Interview/Blind-spot] — [What is unknown]
+```
+
+Create a compact plan:
+
+| # | Gap | Approach | Owner |
+|---|-----|----------|-------|
+| 1 | [gap] | Workspace search / User input / External reference | Agent / User |
+
+For deeper investigation, consider invoking `bmad-domain-research` for domain gaps or `bmad-technical-research` for architecture and technology gaps when those skills are installed.
+
+## Resolve Gaps
+
+Use this routing table per gap:
+
+| Gap type | Action |
+|----------|--------|
+| Workspace-resolvable | Search, read, and present evidence |
+| User-resolvable | Ask targeted questions with `vscode_askQuestions` and wait for response |
+| External | Mark deferred and note the required external input |
+
+**🛑 After each gap — Use `vscode_askQuestions` to confirm whether the gap is resolved or more evidence is needed. In autonomous mode, self-serve and log the decision.**
+
+When the planned gaps are addressed, present a compact satisfaction check:
+
+```markdown
+**Research summary:**
+
+| # | Gap | Status | Resolution |
+|---|-----|--------|------------|
+| 1 | [gap] | ✅ Resolved / ⏸️ Deferred | [summary] |
+
+**Satisfied to proceed?**
+```
+
+**🛑 HALT — Use `vscode_askQuestions` for the satisfaction check. In autonomous mode, self-serve from workspace evidence and log the result.**
+
+| Response | Action |
+|----------|--------|
+| Satisfied | Reset Counter to 0 and proceed |
+| Not satisfied | Identify the remaining gaps and iterate |
+
+## Update State
+
+Reset the Discovery Counter to 0.
+
+Update `{outputFile}` frontmatter:
+
+```yaml
+stepsCompleted: [1, 2, 3, 4]
+discoveryCounter: 0
+researchReason: 'resolved'
+lastStep: 'step-04-research'
+```
+
+Append:
+
+```markdown
+## Research
+
+| # | Gap | Source | Resolution | Status |
+|---|-----|--------|------------|--------|
+| 1 | [gap] | [interview/blind-spot] | [resolution] | ✅ / ⏸️ |
+
+**Discovery Counter reset:** [previous] → 0
+```
+
+Update and output the State Ledger. Record any files, commands, logs, and self-served surfaces consulted during research in `Evidence:`.
+
+## Memory Checkpoint
+
+Per workflow.md §MEMORY CHECKPOINT.
--- a/src/core-skills/bmad-discovery-rigor/steps/step-05-handoff.md
+++ b/src/core-skills/bmad-discovery-rigor/steps/step-05-handoff.md
@ -0,0 +1,115 @@
+# Step 5: Handoff
+
+Compile the final Discovery Context, verify that the discovery work is complete enough to trust, and recommend the next workflow. Do not execute the downstream work here.
+
+## Recovery
+
+Run recovery per workflow.md §RECOVERY PROTOCOL.
+
+## Optional Quality Pass
+
+Check workflow.md §QUALITY ENHANCEMENT. Invoke only the installed skills that materially improve the handoff.
+
+## Compile the Discovery Context
+
+Read `{outputFile}` and replace the working notes with a final Discovery Context using this fixed heading order:
+
+```markdown
+# Discovery Context
+
+## Classification
+## Interview Findings
+## Blind Spots
+## Research Summary
+## Evidence Manifest
+## Contract Candidates
+## Verification Strategy
+## Open Items
+## Constraints and Non-Goals
+## Recommendation
+
+### Discovery Narrative
+### Handoff Brief
+### Next Skill or Workflow
+```
+
+Populate those sections with the following content:
+
+| Section | What to include |
+|---------|-----------------|
+| Classification | Activity, tier, convergence flag, and reasoning from step 1 |
+| Interview Findings | The key answers, contradictions resolved, and remaining unknowns from step 2 |
+| Blind Spots | Resolved items, deferred items with rationale, and any still-open gaps from step 3 |
+| Research Summary | Step 4 findings, or `Not required` when research was not needed |
+| Evidence Manifest | Workspace files, commands, logs, self-served surfaces, and any external sources consulted across the run |
+| Contract Candidates | For convergence work: identity, ownership, operator, evidence, compatibility, and migration seeds |
+| Verification Strategy | How downstream work should prove correctness; include traceability expectations for Structured or Full-Formal work |
+| Open Items | Items with type, owner, and next action |
+| Constraints and Non-Goals | Scope boundaries confirmed during discovery |
+| Recommendation | Discovery narrative, handoff brief, and the next skill or workflow to run |
+
+Compile `Evidence Manifest` and `Open Items` from the running State Ledger and step outputs rather than reconstructing them ad hoc at the end.
+
+## Verify Before Handoff
+
+Run this checklist against the compiled document:
+
+| # | Check |
+|---|-------|
+| 1 | Problem statement unambiguous — no competing interpretations |
+| 2 | Constraints explicit (stated or confirmed `none`) |
+| 3 | Non-goals documented |
+| 4 | Blind spots addressed or deferred with rationale |
+| 5 | All applicable System Reality Categories swept |
+| 6 | Discovery Counter < 2, or research completed and the counter reset |
+| 7 | Verification strategy identified for downstream work |
+| 8 | Evidence Manifest populated |
+| 9 | Open items have owner and next action |
+| 10 | If Convergence = Yes, contract candidates recorded |
+
+**🛑 If any check fails — state which one failed, explain why, and ask how to proceed. Do not hand off until the gap is resolved or explicitly deferred.**
+
+## Save and Recommend
+
+Update `{outputFile}` frontmatter:
+
+```yaml
+stepsCompleted: [1, 2, 3, 5] # or [1, 2, 3, 4, 5] if research conducted
+discoveryCounter: [N]
+lastStep: 'step-05-handoff'
+status: 'complete'
+```
+
+Replace the document body with the compiled Discovery Context.
+
+Discover available skills from `{project-root}/_bmad/_config/bmad-help.csv` if it exists:
+
+1. Load the CSV and ignore `_meta` rows.
+2. Use `phase`, `name`, `code`, `description`, `workflow-file`, and `command` as routing surfaces.
+3. Rank candidate rows by alignment with the classification, convergence signal, open items, and the kind of downstream artifact needed.
+4. Recommend the strongest fit, then record it in `State Ledger Skill:`.
+
+Fallback routing:
+
+| Signal | Skill |
+|--------|-------|
+| Convergence = Yes | `bmad-create-workflow-contract` |
+| Build + architecture | `bmad-create-architecture` |
+| Build + product definition | `bmad-create-prd` |
+| Execute + implementation | `bmad-sprint-planning` or `bmad-create-story` |
+| Solve + Quick | Proceed directly |
+
+Present the completion summary with saved location, recommended next skill, and a reminder that the Discovery Context should inform downstream execution.
+
+**🛑 HALT — Use `vscode_askQuestions` to confirm whether to proceed with the recommended handoff or adjust it. In autonomous mode, self-serve and log the decision.**
+
+## Final State Ledger
+
+Output the final State Ledger with all step summaries, including `Evidence:` and `Skill:`.
+
+## Memory Checkpoint
+
+Per workflow.md §MEMORY CHECKPOINT. Additionally:
+
+- **persist** | scope: workspace | key: learned-patterns | caller: "discovery-rigor"
+- Content: reusable insights from this discovery run
--- a/src/core-skills/bmad-discovery-rigor/workflow.md
+++ b/src/core-skills/bmad-discovery-rigor/workflow.md
@ -0,0 +1,121 @@
+---
+outputFile: '{output_folder}/discovery-context.md'
+---
+
+# Discovery Rigor Workflow
+
+This workflow creates a Discovery Context before execution work begins. Use it when the cost of acting on assumptions is high: ambiguous diagnosis, high-stakes delivery, or convergence work that needs explicit boundaries before implementation.
+
+Keep each step self-contained so the workflow survives context compression. `{outputFile}` frontmatter and the State Ledger are the canonical recovery surfaces, and every step should be able to resume cleanly from them. Speak in `{communication_language}`.
+
+## Step Sequence
+
+| Step | File | Purpose | Conditional |
+|------|------|---------|-------------|
+| 1 | `step-01-classify.md` | Determine the problem type, discovery depth, and convergence signal | No |
+| 1b | `step-01b-quick-handoff.md` | Collapse the workflow for Solve/Quick work after a sanity check | Yes — Solve/Quick only |
+| 2 | `step-02-interview.md` | Close the highest-value information gaps | No |
+| 3 | `step-03-blind-spots.md` | Sweep applicable System Reality Categories | No |
+| 4 | `step-04-research.md` | Resolve unknowns that block a reliable handoff | Yes — Discovery Counter ≥ 2 or evidence-depth override |
+| 5 | `step-05-handoff.md` | Compile, verify, and route the Discovery Context | No |
+
+## STATE LEDGER
+
+Maintain this compact summary across the run. Update it whenever a halt gate or state change materially changes the discovery picture.
+
+```
+State Ledger
+---
+Problem: [one-line summary]
+Class: [activity] | [tier] | Convergence: [Y/N] | Domains: [fragments]
+Position: [current] ✅ → [next] [N/5]
+Counter: [N]
+Findings: [one-line per step]
+Evidence: [files, commands, logs, and self-served surfaces consulted]
+Open: [unresolved items]
+Decisions: [key decisions]
+Skill: [recommended downstream skill or Proceed directly]
+```
+
+## RECOVERY PROTOCOL
+
+At every step entry, before doing anything else:
+
+| `{outputFile}` exists? | `stepsCompleted` match memory? | Action |
+|------------------------|-------------------------------|--------|
+| No | — | Fresh run — proceed normally |
+| Yes | Match | Proceed normally |
+| Yes | Mismatch | Context compressed — reconstruct the State Ledger from file, announce the recovery, then continue from the recovered state |
+| Yes, status: complete | — | Ask: re-run or use existing? |
+
+## AUTONOMOUS MODE
+
+When user requests autonomous execution (e.g., "just investigate", "autonomous mode"):
+
+| Behavior | Rule |
+|----------|------|
+| Step sequence | Run every step — no skipping |
+| Gate answers | Self-serve from workspace evidence |
+| Logging | 🔍 prefix in State Ledger for self-served items |
+| Halting | Only halt for genuinely unresolvable inputs |
+| Summary | `🔍 Self-served: [step] ([N]/[M] resolved via [method]). Unresolved: [items].` |
+
+## DOMAIN FRAGMENT LOADING
+
+Load fragments just in time at steps 2 and 3. Step 1 identifies the likely domains; later steps load only what they need.
+
+| Condition | Fragment | Path |
+|-----------|----------|------|
+| Tier is Structured or Full-Formal | Structured reasoning | `./resources/structured-reasoning.md` |
+| Tier is Full-Formal | Formal readiness probe | `./resources/formal-readiness.md` |
+| Domain is software | Software engineering | `./resources/software-engineering.md` |
+| Domain is LLM/agent | LLM systems | `./resources/llm-systems.md` |
+
+Check `./resources/discovery-resources-index.csv` for the full index. Record applicable fragments in State Ledger `Decisions:`.
+
+## MEMORY CHECKPOINT
+
+Use `bmad-memory-manager` as the durable sidecar for recovery support. `{outputFile}` remains the canonical workflow artifact.
+
+| Event | Operation |
+|-------|-----------|
+| After each completed step | `persist | scope: session | key: state-ledger | caller: "discovery-rigor"` with current frontmatter + State Ledger |
+| Recovery mismatch | `recover | scope: session | key: state-ledger | caller: "discovery-rigor"` |
+| Workflow completion | `persist | scope: workspace | key: learned-patterns | caller: "discovery-rigor"` with reusable insights from the run |
+
+## QUALITY ENHANCEMENT
+
+Optional skill invocations. Evaluate them at step 5 entry only, after the core discovery work is already present.
+
+| Condition | Skill to invoke | Input | What it adds |
+|-----------|----------------|-------|-------------|
+| Interview findings accepted too readily | `bmad-advanced-elicitation` | Interview Q&A | Pushes for refinement |
+| Blind-spot coverage thin | `bmad-review-edge-case-hunter` | Findings so far | Walks branching paths sweep missed |
+| High-stakes or many open items | `bmad-review-adversarial-general` | Full Discovery Context | Adversarial review |
+| Output > 2000 words | `bmad-distillator` | Saved Discovery Context | Compressed handoff version |
+
+Skip any skill that is not installed.
+
+## MULTI-RUN ISOLATION
+
+| Condition | Action |
+|-----------|--------|
+| Single investigation | Use `{outputFile}` directly |
+| Multiple concurrent | Use `{output_folder}/discovery-context-{topic-slug}.md` per run; add `runId` to frontmatter |
+
+## INITIALIZATION
+
+### Configuration Loading
+
+Load config from `{project-root}/_bmad/core/config.yaml` and resolve:
+
+- `user_name`, `output_folder`, `communication_language`
+- `date` as system-generated current datetime
+
+### Check for Existing Work
+
+Check `{outputFile}` per §RECOVERY PROTOCOL table.
+
+### Route to First Step
+
+Read fully and follow: `./steps/step-01-classify.md`
--- a/src/core-skills/bmad-distillator/SKILL.md
+++ b/src/core-skills/bmad-distillator/SKILL.md
@ -1,6 +1,7 @@
 ---
 name: bmad-distillator
 description: Lossless LLM-optimized compression of source documents. Use when the user requests to 'distill documents' or 'create a distillate'.
+argument-hint: '[provide input paths] [--validate to run round-trip validation after generation]'
 ---

 # Distillator: A Document Distillation Engine
@ -9,7 +10,7 @@ description: Lossless LLM-optimized compression of source documents. Use when th

 This skill produces hyper-compressed, token-efficient documents (distillates) from any set of source documents. A distillate preserves every fact, decision, constraint, and relationship from the sources while stripping all overhead that humans need and LLMs don't. Act as an information extraction and compression specialist. The output is a single dense document (or semantically-split set) that a downstream LLM workflow can consume as sole context input without information loss.

-This is a compression task, not a summarization task. Summaries are lossy. Distillates are lossless compression optimized for LLM consumption.
+This is a compression task, not a summarization task. Summaries are lossy. Distillates are optimized for lossless or near-lossless downstream LLM consumption: Stage 3 provides a structured coverage check, and `--validate` adds the stronger round-trip proof pass.

 ## On Activation

@ -17,54 +18,66 @@ This is a compression task, not a summarization task. Summaries are lossy. Disti
   - **source_documents** (required) — One or more file paths, folder paths, or glob patterns to distill
   - **downstream_consumer** (optional) — What workflow/agent consumes this distillate (e.g., "PRD creation", "architecture design"). When provided, use it to judge signal vs noise. When omitted, preserve everything.
   - **token_budget** (optional) — Approximate target size. When provided and the distillate would exceed it, trigger semantic splitting.
-   - **output_path** (optional) — Where to save. When omitted, save adjacent to the primary source document with `-distillate.md` suffix.
-   - **--validate** (flag) — Run round-trip reconstruction test after producing the distillate.
+   - **output_path** (optional) — Where to save. When omitted, save adjacent to the primary source document using `-distillate.md` for single-file output or `-distillate/` for split output.
+   - **--validate** (flag) — Run round-trip reconstruction test after producing the distillate. If requested, confirm subagent support before Stage 1; if support is unavailable, ask whether to continue without validation or abort rather than silently downgrading.
+   - **calibration_path** and **calibration_status** (optional) — Only used when `downstream_consumer` indicates calibration. If provided, include them in output frontmatter.

-2. **Route** — proceed to Stage 1.
+2. **Preflight runtime.** The bundled `scripts/analyze_sources.py` next to this skill requires Python 3.10+. It scans `.md`, `.txt`, `.yaml`, `.yml`, and `.json`, and skips `node_modules`, `.git`, `__pycache__`, `.venv`, `venv`, `.claude`, `_bmad-output`, `.cursor`, and `.vscode` directories when walking folders.
+
+3. **Route** — proceed to Stage 1.
+
+   **Session trace distillation:** When `downstream_consumer` indicates session traces, protocol outputs, or calibration, load `resources/session-trace-template.md` for section-preservation rules and compression guidance specific to structured session output.
+
+   **Ambiguous output naming:** If the input expands to multiple unrelated primary documents and `output_path` is omitted, ask for `output_path` rather than guessing the base name.

 ## Stages

-| # | Stage | Purpose |
-|---|-------|---------|
-| 1 | Analyze | Run analysis script, determine routing and splitting |
-| 2 | Compress | Spawn compressor agent(s) to produce the distillate |
-| 3 | Verify & Output | Completeness check, format check, save output |
-| 4 | Round-Trip Validate | (--validate only) Reconstruct and diff against originals |
+| #   | Stage               | Purpose                                                  |
+| --- | ------------------- | -------------------------------------------------------- |
+| 1   | Analyze             | Run analysis script, determine routing and splitting     |
+| 2   | Compress            | Spawn compressor agent(s) to produce the distillate      |
+| 3   | Verify & Output     | Completeness check, format check, save output            |
+| 4   | Round-Trip Validate | (--validate only) Reconstruct and diff against originals |

 ### Stage 1: Analyze

-Run `scripts/analyze_sources.py --help` then run it with the source paths. Use its routing recommendation and grouping output to drive Stage 2. Do NOT read the source documents yourself.
+Run the bundled analyzer from this skill's `scripts/` directory (`./scripts/analyze_sources.py`; in the source repo, `src/core-skills/bmad-distillator/scripts/analyze_sources.py`) with `--help`, then run it with the source paths. Use its routing recommendation and grouping output to drive Stage 2. Do NOT read the source documents yourself unless script execution is unavailable.
+
+If the script cannot run, apply the same extension and skip-directory rules manually, require an explicit `output_path` when the base name is ambiguous, and proceed with a direct read of the resolved source files.

 ### Stage 2: Compress

 **Single mode** (routing = `"single"`, ≤3 files, ≤15K estimated tokens):

-Spawn one subagent using `agents/distillate-compressor.md` with all source file paths.
+Spawn one subagent using `agents/distillate-compressor.md` with all source file paths, `downstream_consumer`, and the Stage 1 splitting decision.

 **Fan-out mode** (routing = `"fan-out"`):

-1. Spawn one compressor subagent per group from the analysis output. Each compressor receives only its group's file paths and produces an intermediate distillate.
+1. Spawn one compressor subagent per group from the analysis output. Each compressor receives only its group's source file paths, `downstream_consumer`, and the Stage 1 splitting decision. Capture each compressor's returned `coverage_manifest`.

-2. After all compressors return, spawn one final **merge compressor** subagent using `agents/distillate-compressor.md`. Pass it the intermediate distillate contents as its input (not the original files). Its job is cross-group deduplication, thematic regrouping, and final compression.
+2. Materialize each intermediate distillate as a temporary distillate file.

-3. Clean up intermediate distillate content (it exists only in memory, not saved to disk).
+3. Build `expected_coverage_manifest` as the union of the group `coverage_manifest` values. After all compressors return, spawn one final **merge compressor** subagent using `agents/distillate-compressor.md`. Pass it the temporary intermediate distillate file paths as its input, not the original source documents, along with `downstream_consumer`, the final splitting decision, and the union `expected_coverage_manifest`. Its job is cross-group deduplication, thematic regrouping, and final compression while preserving the original-source coverage surface.

-**Graceful degradation:** If subagent spawning is unavailable, read the source documents and perform the compression work directly using the same instructions from `agents/distillate-compressor.md`. For fan-out, process groups sequentially then merge.
+4. Clean up the temporary intermediate distillate files after the merge pass completes.

-The compressor returns a structured JSON result containing the distillate content, source headings, named entities, and token estimate.
+**Graceful degradation:** If subagent spawning is unavailable, read the source documents and perform the compression work directly using the same instructions from `agents/distillate-compressor.md`. For fan-out, process groups sequentially, write temporary intermediate distillate files, preserve each group's `coverage_manifest`, then merge from those files using the union `expected_coverage_manifest`.
+
+The compressor returns a structured JSON result containing the distillate content, a coverage manifest, and token estimate.

 ### Stage 3: Verify & Output

 After the compressor (or merge compressor) returns:

-1. **Completeness check.** Using the headings and named entities list returned by the compressor, verify each appears in the distillate content. If gaps are found, send them back to the compressor for a targeted fix pass — not a full recompression. Limit to 2 fix passes maximum.
+1. **Completeness check.** Determine the expected coverage surface before validating output. In single mode, use the compressor's returned `coverage_manifest`. In fan-out mode, use the union `expected_coverage_manifest` carried forward from the group compressors and require the merge compressor's returned `coverage_manifest` to preserve that same surface. Verify that headings, named entities, numeric facts, decisions, constraints, scope boundaries, open questions, and any intentionally dropped items are preserved or deliberately omitted with reason. For session-trace or calibration mode, also verify that the required sections appear in order and the final State Ledger is preserved verbatim. If gaps are found, send them back to the compressor for a targeted fix pass while preserving the same expected coverage surface. Limit to 2 fix passes maximum.

 2. **Format check.** Verify the output follows distillate format rules:
-   - No prose paragraphs (only bullets)
+   - Default mode: no prose paragraphs (only bullets)
   - No decorative formatting
   - No repeated information
   - Each bullet is self-contained
   - Themes are clearly delineated with `##` headings
+   - Session-trace and calibration mode may use the structured tables and ordered sections required by `resources/session-trace-template.md`

 3. **Determine output format.** Using the split prediction from Stage 1 and actual distillate size:

@ -76,11 +89,11 @@ After the compressor (or merge compressor) returns:
   ---
   type: bmad-distillate
   sources:
-     - "{relative path to source file 1}"
-     - "{relative path to source file 2}"
+     - '{relative path to source file 1}'
+     - '{relative path to source file 2}'
   downstream_consumer: "{consumer or 'general'}"
-   created: "{date}"
-   token_estimate: {approximate token count}
+   created: '{date}'
+   token_estimate: { approximate token count }
   parts: 1
   ---
   ```
@ -98,16 +111,33 @@ After the compressor (or merge compressor) returns:
   ```

   The `_index.md` contains:
-   - Frontmatter with sources (relative paths from the distillate folder to the originals)
+    - Frontmatter with the same core schema as a single-file distillate:
+
+    ```yaml
+    ---
+    type: bmad-distillate
+    sources:
+       - '{relative path to source file 1}'
+       - '{relative path to source file 2}'
+    downstream_consumer: "{consumer or 'general'}"
+    created: '{date}'
+    token_estimate: { approximate token count across all parts }
+    parts: { part count }
+    ---
+    ```
+
+    - Include `calibration_path` and `calibration_status` here when calibration metadata applies
   - 3-5 bullet orientation (what was distilled, from what)
   - Section manifest: each section's filename + 1-line description
   - Cross-cutting items that span multiple sections

-   Each section file is self-contained — loadable independently. Include a 1-line context header: "This section covers [topic]. Part N of M."
+   Each section file is self-contained — loadable independently. Start each with a first bullet context line: `- Context: This section covers [topic]. Part N of M.`

   Source paths in frontmatter must be relative to the distillate's location.

-4. **Measure distillate.** Run `scripts/analyze_sources.py` on the final distillate file(s) to get accurate token counts for the output. Use the `total_estimated_tokens` from this analysis as `distillate_total_tokens`.
+   When `downstream_consumer` indicates calibration and `calibration_path` plus `calibration_status` are provided, include them in the frontmatter for the single distillate or `_index.md`.
+
+4. **Measure distillate.** Run the same bundled analyzer on the final distillate file(s) to get accurate token counts for the output. Use the `total_estimated_tokens` from this analysis as `distillate_total_tokens`.

 5. **Report results.** Always return structured JSON output:

@ -120,11 +150,14 @@ After the compressor (or merge compressor) returns:
     "distillate_total_tokens": N,
     "compression_ratio": "X:1",
     "source_documents": ["{path1}", "{path2}"],
-     "completeness_check": "pass" or "pass_with_additions"
+       "completeness_check": "pass" or "pass_with_additions",
+       "validation_status": "not_requested" | "pass" | "pass_with_warnings" | "fail" | "skipped_no_subagent_support",
+       "validation_report": "{path}" or null,
+       "validation_note": null or "{reason validation was skipped or warned}"
   }
   ```

-   Where `source_total_tokens` is from the Stage 1 analysis and `distillate_total_tokens` is from step 4. The `compression_ratio` is `source_total_tokens / distillate_total_tokens` formatted as "X:1" (e.g., "3.2:1").
+    Where `source_total_tokens` is from the Stage 1 analysis and `distillate_total_tokens` is from step 4. The `compression_ratio` is `source_total_tokens / distillate_total_tokens` formatted as "X:1" (e.g., "3.2:1"). Set `validation_status = "not_requested"` when `--validate` was not used. If Stage 4 runs, update the same payload with the final validation fields before returning it.

 6. If `--validate` flag was set, proceed to Stage 4. Otherwise, done.

@ -132,11 +165,9 @@ After the compressor (or merge compressor) returns:

 This stage proves the distillate is lossless by reconstructing source documents from the distillate alone. Use for critical documents where information loss is unacceptable, or as a quality gate for high-stakes downstream workflows. Not for routine use — it adds significant token cost.

-1. **Spawn the reconstructor agent** using `agents/round-trip-reconstructor.md`. Pass it ONLY the distillate file path (or `_index.md` path for split distillates) — it must NOT have access to the original source documents.
+1. **Spawn the reconstructor agent** using `agents/round-trip-reconstructor.md`. Pass it ONLY the distillate entrypoint path — the single distillate file for unsplit output, or `_index.md` for split output. It must NOT have access to the original source documents.

-   For split distillates, spawn one reconstructor per section in parallel. Each receives its section file plus the `_index.md` for cross-cutting context.
-
-   **Graceful degradation:** If subagent spawning is unavailable, this stage cannot be performed by the main agent (it has already seen the originals). Report that round-trip validation requires subagent support and skip.
+   **Graceful degradation:** If subagent support disappears after the preflight check, return the standard JSON result with `validation_status: "skipped_no_subagent_support"`, `validation_report: null`, and `validation_note` explaining that round-trip validation could not run in this environment.

 2. **Receive reconstructions.** The reconstructor returns reconstruction file paths saved adjacent to the distillate.

@ -146,32 +177,41 @@ This stage proves the distillate is lossless by reconstructing source documents
   - Are relationships and rationale intact?
   - Did the reconstruction add anything not in the original? (indicates hallucination filling gaps)

-4. **Produce validation report** saved adjacent to the distillate as `-validation-report.md`:
+4. **Produce validation report** using an exact path contract:
+
+   - Single distillate file: save `{distillate-basename}-validation-report.md` adjacent to the distillate file
+   - Split distillate folder: save `_validation-report.md` inside the distillate folder adjacent to `_index.md`

   ```markdown
   ---
   type: distillate-validation
-   distillate: "{distillate path}"
-   sources: ["{source paths}"]
-   created: "{date}"
+   distillate: '{distillate path}'
+   sources: ['{source paths}']
+   created: '{date}'
   ---

   ## Validation Summary
+
   - Status: PASS | PASS_WITH_WARNINGS | FAIL
   - Information preserved: {percentage estimate}
   - Gaps found: {count}
   - Hallucinations detected: {count}

   ## Gaps (information in originals but missing from reconstruction)
+
   - {gap description} — Source: {which original}, Section: {where}

   ## Hallucinations (information in reconstruction not traceable to originals)
+
   - {hallucination description} — appears to fill gap in: {section}

   ## Possible Gap Markers (flagged by reconstructor)
+
   - {marker description}
   ```

 5. **If gaps are found**, offer to run a targeted fix pass on the distillate — adding the missing information without full recompression. Limit to 2 fix passes maximum.

 6. **Clean up** — delete the temporary reconstruction files after the report is generated.
+
+7. **Return final result** — update the Stage 3 structured JSON payload with `validation_status`, `validation_report`, and `validation_note` before returning it.
--- a/src/core-skills/bmad-distillator/agents/distillate-compressor.md
+++ b/src/core-skills/bmad-distillator/agents/distillate-compressor.md
@ -2,19 +2,24 @@

 Act as an information extraction and compression specialist. Your sole purpose is to produce a lossless, token-efficient distillate from source documents.

-You receive: source document file paths, an optional downstream_consumer context, and a splitting decision.
+You receive: source document file paths or intermediate distillate file paths, an optional downstream_consumer context, a splitting decision, and optionally `expected_coverage_manifest` when a merge or targeted fix pass must preserve the original source coverage surface.

 You must load and apply `../resources/compression-rules.md` before producing output. Reference `../resources/distillate-format-reference.md` for the expected output format.

+When `downstream_consumer` indicates session traces, protocol outputs, or calibration, also load `../resources/session-trace-template.md` and preserve its required structure.
+
 ## Compression Process

 ### Step 1: Read Sources

-Read all source document files. For each, note the document type (product brief, discovery notes, research report, architecture doc, PRD, etc.) based on content and naming.
+Read all input files. Inputs may be original source documents or temporary intermediate distillates from a fan-out merge pass. For each input, note the document type (product brief, discovery notes, research report, architecture doc, PRD, distillate, etc.) based on content and naming.
+
+If `expected_coverage_manifest` is provided, treat it as the authoritative coverage target for the output. Use the current inputs for regrouping and wording, but do not shrink the required coverage surface just because the current inputs are already-compressed intermediates.

 ### Step 2: Extract

 Extract every discrete piece of information from all source documents:
+
 - Facts and data points (numbers, dates, versions, percentages)
 - Decisions made and their rationale
 - Rejected alternatives and why they were rejected
@ -36,6 +41,7 @@ Apply the deduplication rules from `../resources/compression-rules.md`.
 ### Step 4: Filter (only if downstream_consumer is specified)

 For each extracted item, ask: "Would the downstream workflow need this?"
+
 - Drop items that are clearly irrelevant to the stated consumer
 - When uncertain, keep the item — err on the side of preservation
 - Never drop: decisions, rejected alternatives, open questions, constraints, scope boundaries
@ -45,6 +51,7 @@ For each extracted item, ask: "Would the downstream workflow need this?"
 Organize items into coherent themes derived from the source content — not from a fixed template. The themes should reflect what the documents are actually about.

 Common groupings (use what fits, omit what doesn't, add what's needed):
+
 - Core concept / problem / motivation
 - Solution / approach / architecture
 - Users / segments
@ -59,6 +66,7 @@ Common groupings (use what fits, omit what doesn't, add what's needed):
 ### Step 6: Compress Language

 For each item, apply the compression rules from `../resources/compression-rules.md`:
+
 - Strip prose transitions and connective tissue
 - Remove hedging and rhetoric
 - Remove explanations of common knowledge
@ -68,13 +76,16 @@ For each item, apply the compression rules from `../resources/compression-rules.

 ### Step 7: Format Output

-Produce the distillate as dense thematically-grouped bullets:
- `##` headings for themes — no deeper heading levels needed
- `- ` bullets for items — every token must carry signal
- No decorative formatting (no bold for emphasis, no horizontal rules)
- No prose paragraphs — only bullets
- Semicolons to join closely related short items within a single bullet
- Each bullet self-contained — understandable without reading other bullets
+Produce the distillate in the correct mode:
+
+- **Default mode:** dense thematically-grouped bullets
+  - `##` headings for themes — no deeper heading levels needed
+  - `- ` bullets for items — every token must carry signal
+  - No decorative formatting (no bold for emphasis, no horizontal rules)
+  - No prose paragraphs — only bullets
+  - Semicolons to join closely related short items within a single bullet
+  - Each bullet self-contained — understandable without reading other bullets
+- **Session-trace / calibration mode:** preserve the ordered sections, tables, and verbatim final State Ledger required by `../resources/session-trace-template.md`. Stay compressed, but do not collapse those required structures into generic bullets.

 Do NOT include frontmatter — the calling skill handles that.

@ -91,7 +102,7 @@ When splitting:
   - Cross-references to section distillates
   - Items that span multiple sections

-3. Produce **section distillates**, each self-sufficient. Include a 1-line context header: "This section covers [topic]. Part N of M from [source document names]."
+3. Produce **section distillates**, each self-sufficient. Start each section with a first bullet context line: `- Context: This section covers [topic]. Part N of M from [source document names].`

 ## Return Format

@ -100,16 +111,23 @@ Return a structured result to the calling skill:
 ```json
 {
  "distillate_content": "{the complete distillate text without frontmatter}",
-  "source_headings": ["heading 1", "heading 2"],
-  "source_named_entities": ["entity 1", "entity 2"],
+  "coverage_manifest": {
+    "source_headings": ["heading 1", "heading 2"],
+    "named_entities": ["entity 1", "entity 2"],
+    "numeric_facts": ["1200", "83K"],
+    "decisions": ["Rejected X because Y"],
+    "constraints": ["Must preserve Z"],
+    "scope_boundaries": ["In: A", "Out: B"],
+    "open_questions": ["Question 1"],
+    "intentionally_dropped": ["Item omitted for downstream consumer relevance"]
+  },
  "token_estimate": N,
  "sections": null or [{"topic": "...", "content": "..."}]
 }
 ```

 - **distillate_content**: The full distillate text
- **source_headings**: All Level 2+ headings found across source documents (for completeness verification)
- **source_named_entities**: Key named entities (products, companies, people, technologies, decisions) found in sources
+- **coverage_manifest**: Structured coverage surfaces used by the calling skill's completeness gate. If `expected_coverage_manifest` was supplied, return a manifest that preserves that full upstream surface (plus any newly observed coverage) rather than replacing it with observations from intermediate distillates alone.
 - **token_estimate**: Approximate token count of the distillate
 - **sections**: null for single distillates; array of section objects if semantically split

--- a/src/core-skills/bmad-distillator/agents/round-trip-reconstructor.md
+++ b/src/core-skills/bmad-distillator/agents/round-trip-reconstructor.md
@ -2,20 +2,24 @@

 Act as a document reconstruction specialist. Your purpose is to prove a distillate's completeness by reconstructing the original source documents from the distillate alone.

-**Critical constraint:** You receive ONLY the distillate file path. You must NOT have access to the original source documents. If you can see the originals, the test is meaningless.
+**Critical constraint:** You receive ONLY one distillate entrypoint file path. You must NOT have access to the original source documents. If you can see the originals, the test is meaningless.

 ## Process

 ### Step 1: Analyze the Distillate

 Read the distillate file. Parse the YAML frontmatter to identify:
+
 - The `sources` list — what documents were distilled
 - The `downstream_consumer` — what filtering may have been applied
 - The `parts` count — whether this is a single or split distillate

+If `parts > 1` or the file is `_index.md`, treat it as a split-distillate entrypoint. Load the sibling section files referenced by the split distillate folder so reconstruction uses the full distillate corpus, but keep the entrypoint path as the only caller input.
+
 ### Step 2: Detect Document Types

 From the source file names and the distillate's content, infer what type of document each source was:
+
 - Product brief, discovery notes, research report, architecture doc, PRD, etc.
 - Use the naming conventions and content themes to determine appropriate document structure

@ -31,6 +35,7 @@ For each source listed in the frontmatter, produce a full human-readable documen
 - Flag any places where the distillate felt insufficient with `[POSSIBLE GAP]` markers — these are critical quality signals

 **Quality signals to watch for:**
+
 - Bullets that feel like they're missing context → `[POSSIBLE GAP: missing context for X]`
 - Themes that seem underrepresented given the document type → `[POSSIBLE GAP: expected more on X for a document of this type]`
 - Relationships that are mentioned but not fully explained → `[POSSIBLE GAP: relationship between X and Y unclear]`
@ -38,6 +43,7 @@ For each source listed in the frontmatter, produce a full human-readable documen
 ### Step 4: Save Reconstructions

 Save each reconstructed document as a temporary file adjacent to the distillate:
+
 - First source: `{distillate-basename}-reconstruction-1.md`
 - Second source: `{distillate-basename}-reconstruction-2.md`
 - And so on for each source
@ -47,9 +53,9 @@ Each reconstruction should include a header noting it was reconstructed:
 ```markdown
 ---
 type: distillate-reconstruction
-source_distillate: "{distillate path}"
-reconstructed_from: "{original source name}"
-reconstruction_number: {N}
+source_distillate: '{distillate path}'
+reconstructed_from: '{original source name}'
+reconstruction_number: { N }
 ---
 ```

--- a/src/core-skills/bmad-distillator/resources/distillate-format-reference.md
+++ b/src/core-skills/bmad-distillator/resources/distillate-format-reference.md
@ -10,10 +10,10 @@ Every distillate includes YAML frontmatter. Source paths are relative to the dis
 ---
 type: bmad-distillate
 sources:
-  - "product-brief-example.md"
-  - "product-brief-example-discovery-notes.md"
-downstream_consumer: "PRD creation"
-created: "2026-03-13"
+  - 'product-brief-example.md'
+  - 'product-brief-example-discovery-notes.md'
+downstream_consumer: 'PRD creation'
+created: '2026-03-13'
 token_estimate: 1200
 parts: 1
 ---
@ -24,6 +24,7 @@ parts: 1
 ### Prose Paragraph to Dense Bullet

 **Before** (human-readable brief excerpt):
+
 ```
 ## What Makes This Different

@ -38,6 +39,7 @@ itself.
 ```

 **After** (distillate):
+
 ```
 ## Differentiation
 - Anti-fragmentation positioning: BMAD = cross-platform constant across 40+ fragmenting AI tools; no competitor provides shared methodology layer
@ -47,6 +49,7 @@ itself.
 ### Technical Details to Compressed Facts

 **Before** (discovery notes excerpt):
+
 ```
 ## Competitive Landscape

@ -66,6 +69,7 @@ itself.
 ```

 **After** (distillate):
+
 ```
 ## Competitive Landscape
 - No competitor combines structured methodology + plugin marketplace (whitespace)
@ -80,17 +84,20 @@ itself.
 When the same fact appears in both a brief and discovery notes:

 **Brief says:**
+
 ```
 bmad-help must always be included as a base skill in every bundle
 ```

 **Discovery notes say:**
+
 ```
 bmad-help must always be included as a base skill in every bundle/install
 (solves discoverability problem)
 ```

 **Distillate keeps the more contextual version:**
+
 ```
 - bmad-help: always included as base skill in every bundle (solves discoverability)
 ```
@ -98,6 +105,7 @@ bmad-help must always be included as a base skill in every bundle/install
 ### Decision/Rationale Compression

 **Before:**
+
 ```
 We decided not to build our own platform support matrix going forward, instead
 delegating to the Vercel skills CLI ecosystem. The rationale is that maintaining
@ -106,6 +114,7 @@ at 40+ platforms.
 ```

 **After:**
+
 ```
 - Rejected: own platform support matrix. Reason: unsustainable at 40+ platforms; delegate to Vercel CLI ecosystem
 ```
@ -118,26 +127,29 @@ A complete distillate produced from a product brief and its discovery notes, tar
 ---
 type: bmad-distillate
 sources:
-  - "product-brief-bmad-next-gen-installer.md"
-  - "product-brief-bmad-next-gen-installer-discovery-notes.md"
-downstream_consumer: "PRD creation"
-created: "2026-03-13"
+  - 'product-brief-bmad-next-gen-installer.md'
+  - 'product-brief-bmad-next-gen-installer-discovery-notes.md'
+downstream_consumer: 'PRD creation'
+created: '2026-03-13'
 token_estimate: 1450
 parts: 1
 ---

 ## Core Concept
+
 - BMAD Next-Gen Installer: replaces monolithic Node.js CLI with skill-based plugin architecture for distributing BMAD methodology across 40+ AI platforms
 - Three layers: self-describing plugins (bmad-manifest.json), cross-platform install via Vercel skills CLI (MIT), runtime registration via bmad-setup skill
 - Transforms BMAD from dev-only methodology into open platform for any domain (creative, therapeutic, educational, personal)

 ## Problem
+
 - Current installer maintains ~20 platform configs manually; each platform convention change requires installer update, test, release — largest maintenance burden on team
 - Node.js/npm required — blocks non-technical users on UI-based platforms (Claude Co-Work, etc.)
 - CSV manifests are static, generated once at install; no runtime scanning/registration
 - Unsustainable at 40+ platforms; new tools launching weekly

 ## Solution Architecture
+
 - Plugins: skill bundles with Anthropic plugin standard as base format + bmad-manifest.json extending for BMAD-specific metadata (installer options, capabilities, help integration, phase ordering, dependencies)
 - Existing manifest example: `{"module-code":"bmm","replaces-skill":"bmad-create-product-brief","capabilities":[{"name":"create-brief","menu-code":"CB","supports-headless":true,"phase-name":"1-analysis","after":["brainstorming"],"before":["create-prd"],"is-required":true}]}`
 - Vercel skills CLI handles platform translation; integration pattern (wrap/fork/call) is PRD decision
@ -147,17 +159,20 @@ parts: 1
 - Non-technical path has honest friction: "copy to right folder" requires knowing where; per-platform README instructions; improves over time as low-code space matures

 ## Differentiation
+
 - Anti-fragmentation: BMAD = cross-platform constant; no competitor provides shared methodology layer across AI tools
 - Curated quality: all submissions gated, human-reviewed by BMad + core team; 13.4% of community skills have critical vulnerabilities (Snyk 2026); quality gate value increases as ecosystem gets noisier
 - Domain-agnostic: no competitor builds beyond software dev workflows; same plugin system powers any domain via BMAD Builder (separate initiative)

 ## Users (ordered by v1 priority)
+
 - Module authors (primary v1): package/test/distribute plugins independently without installer changes
 - Developers: single-command install on any of 40+ platforms via NPX
 - Non-technical users: install without Node/Git/terminal; emerging segment including PMs, designers, educators
 - Future plugin creators: non-dev authors using BMAD Builder; need distribution without building own installer

 ## Success Criteria
+
 - Zero (or near-zero) custom platform directory code; delegated to skills CLI ecosystem
 - Installation verified on top platforms by volume; skills CLI handles long tail
 - Non-technical install path validated with non-developer users
@ -182,6 +197,7 @@ parts: 1
 - Key shift: CSV-based static manifests → JSON-based runtime scanning

 ## Vercel Skills CLI
+
 - `npx skills add <source>` — GitHub, GitLab, local paths, git URLs
 - 40+ agents; per-agent path mappings; symlinks (recommended) or copies
 - Scopes: project-level or global
@ -190,6 +206,7 @@ parts: 1
 - Non-interactive: `-y`, `--all` flags for CI/CD

 ## Competitive Landscape
+
 - No competitor combines structured methodology + plugin marketplace (whitespace)
 - Skills.sh (Vercel): 83K skills, dev-only, 20% trigger reliability without explicit prompting
 - SkillsMP: 400K skills, aggregator only, no curation
@ -198,11 +215,13 @@ parts: 1
 - Market: $7.84B (2025) → $52.62B (2030); Agent Skills spec ~4 months old, 351K+ skills; standards converging under Linux Foundation AAIF (MCP, AGENTS.md, A2A)

 ## Rejected Alternatives
+
 - Building own platform support matrix: unsustainable at 40+; delegate to Vercel ecosystem
 - One-click install for non-technical v1: emerging space; guidance-based, improve over time
 - Prior roadmap/brainstorming: clean start, unconstrained by previous planning

 ## Open Questions
+
 - Vercel CLI integration pattern: wrap/fork/call/peer dependency?
 - bmad-update mechanics: diff/replace? Preserve user customizations?
 - Migration story: command/manual reinstall/compatibility shim?
@ -213,12 +232,14 @@ parts: 1
 - Plugin author getting-started experience and tooling?

 ## Opportunities
+
 - Module authors as acquisition channel: each published plugin distributes BMAD to creator's audience
 - CI/CD integration: bmad-setup as pipeline one-liner increases stickiness
 - Educational institutions: structured methodology + non-technical install → university AI curriculum
 - Skill composability: mixing BMAD modules with third-party skills for custom methodology stacks

 ## Risks
+
 - Manifest format evolution creates versioning/compatibility burden once third-party authors publish
 - Quality gate needs defined process, not just claim — gated review model addresses
 - 40+ platform testing environments even with Vercel handling translation
--- a/src/core-skills/bmad-distillator/resources/session-trace-template.md
+++ b/src/core-skills/bmad-distillator/resources/session-trace-template.md
@ -0,0 +1,69 @@
+# Session Trace Output Template
+
+_Use this template when `downstream_consumer` is "session trace", "protocol output", or "calibration". It defines the structure a distilled session trace should preserve._
+
+## Required Sections
+
+A session trace distillate must preserve these sections in order:
+
+### Header
+
+- Activity type and tier (Solve/Build/Execute + depth)
+- Commands exercised
+- Protocol files loaded
+- Self-serve summary
+
+### Discovery Counter Log
+
+| Step | Counter change | Reason |
+| ---- | -------------- | ------ |
+
+### Evidence Log
+
+- Authoritative files read
+- Files changed
+- Validation performed
+- Calibration follow-up status
+
+### Outcome Summary
+
+| Area              | Assessment            |
+| ----------------- | --------------------- |
+| Problem framing   | Pass / Partial / Fail |
+| Evidence coverage | Pass / Partial / Fail |
+| Verification      | Pass / Partial / Fail |
+| Output quality    | Pass / Partial / Fail |
+
+Unresolved items (if any).
+
+### State Ledger (final)
+
+- Session identifier
+- Problem summary
+- Activity and tier
+- Workflow position (completed command sequence)
+- Discovery Counter (final value)
+- Findings per command
+- Open items
+- Stop-gate decisions
+
+## Compression Rules for Session Traces
+
+When compressing session traces:
+
+- **Preserve all stop-gate outcomes** — these are decisions, not filler
+- **Preserve Discovery Counter transitions** — each increment/reset is a signal
+- **Preserve evidence references** — file paths, search results, test outcomes
+- **Compress reasoning narrative** — keep conclusions, drop exploration prose
+- **Preserve the final State Ledger verbatim** — it enables session resumption
+- **Drop intermediate State Ledger snapshots** — only the final one matters
+- **Preserve formal receipts** — counterexample witnesses, proof targets, dispositions
+
+## Calibration Metadata
+
+When the distillate is produced for calibration purposes, include in frontmatter:
+
+```yaml
+calibration_path: '[mode/path label from calibration index]'
+calibration_status: 'candidate | promoted | gap-filler'
+```
--- a/src/core-skills/bmad-distillator/resources/splitting-strategy.md
+++ b/src/core-skills/bmad-distillator/resources/splitting-strategy.md
@ -11,6 +11,7 @@ Arbitrary splits (every N tokens) break coherence. A downstream workflow loading
 ### 1. Identify Natural Boundaries

 After the initial extraction and deduplication (Steps 1-2 of the compression process), look for natural semantic boundaries:
+
 - Distinct problem domains or functional areas
 - Different stakeholder perspectives (users, technical, business)
 - Temporal boundaries (current state vs future vision)
@ -24,6 +25,7 @@ Choose boundaries that produce sections a downstream workflow might load indepen
 For each extracted item, assign it to the most relevant section. Items that span multiple sections go in the root distillate.

 Cross-cutting items (items relevant to multiple sections):
+
 - Constraints that affect all areas → root distillate
 - Decisions with broad impact → root distillate
 - Section-specific decisions → section distillate
@ -31,6 +33,7 @@ Cross-cutting items (items relevant to multiple sections):
 ### 3. Produce Root Distillate

 The root distillate contains:
+
 - **Orientation** (3-5 bullets): what was distilled, from what sources, for what consumer, how many sections
 - **Cross-references**: list of section distillates with 1-line descriptions
 - **Cross-cutting items**: facts, decisions, and constraints that span multiple sections
@ -41,6 +44,7 @@ The root distillate contains:
 Each section distillate must be self-sufficient — a reader loading only one section should understand it without the others.

 Each section includes:
+
 - **Context header** (1 line): "This section covers [topic]. Part N of M from [source document names]."
 - **Section content**: thematically-grouped bullets following the same compression rules as a single distillate
 - **Cross-references** (if needed): pointers to other sections for related content
@ -58,6 +62,7 @@ Create a folder `{base-name}-distillate/` containing:
 ```

 Example:
+
 ```
 product-brief-distillate/
 ├── _index.md
@ -69,10 +74,12 @@ product-brief-distillate/
 ## Size Targets

 When a token_budget is specified:
+
 - Root distillate: ~20% of budget (orientation + cross-cutting items)
 - Remaining budget split proportionally across sections based on content density
 - If a section exceeds its proportional share, compress more aggressively or sub-split

 When no token_budget but splitting is needed:
+
 - Aim for sections of 3,000-5,000 tokens each
 - Root distillate as small as possible while remaining useful standalone
--- a/src/core-skills/bmad-help/SKILL.md
+++ b/src/core-skills/bmad-help/SKILL.md
@ -16,7 +16,7 @@ When this skill completes, the user should:
 1. **Know where they are** — which module and phase they're in, what's already been completed
 2. **Know what to do next** — the next recommended and/or required step, with clear reasoning
 3. **Know how to invoke it** — skill name, menu code, action context, and any args that shortcut the conversation
-4. **Get offered a quick start** — when a single skill is the clear next step, offer to run it for the user right now rather than just listing it
+4. **Get offered a quick start** — when a single required next step or State Ledger handoff is clear, offer the exact invocation text for a fresh context window rather than a vague suggestion
 5. **Feel oriented, not overwhelmed** — surface only what's relevant to their current position; don't dump the entire catalog
 6. **Get answers to general questions** — when the question doesn't map to a specific skill, use the module's registered documentation to give a grounded answer

@ -37,19 +37,23 @@ module,skill,display-name,menu-code,description,action,args,phase,after,before,r
 ```

 **Phases** determine the high-level flow:
+
 - `anytime` — available regardless of workflow state
 - Numbered phases (`1-analysis`, `2-planning`, etc.) flow in order; naming varies by module

 **Dependencies** determine ordering within and across phases:
+
 - `after` — skills that should ideally complete before this one
 - `before` — skills that should run after this one
 - Format: `skill-name` for single-action skills, `skill-name:action` for multi-action skills

 **Required gates**:
+
 - `required=true` items must complete before the user can meaningfully proceed to later phases
 - A phase with no required items is entirely optional — recommend it but be clear about what's actually required next

 **Completion detection**:
+
 - Search resolved output paths for `outputs` patterns
 - Fuzzy-match found files to catalog rows
 - User may also state completion explicitly, or it may be evident from the current conversation
@ -59,6 +63,7 @@ module,skill,display-name,menu-code,description,action,args,phase,after,before,r
 ## Response Format

 For each recommended item, present:
+
 - `[menu-code]` **Display name** — e.g., "[CP] Create PRD"
 - Skill name in backticks — e.g., `bmad-create-prd`
 - For multi-action skills: action invocation context — e.g., "tech-writer lets create a mermaid diagram!"
@ -67,9 +72,44 @@ For each recommended item, present:

 **Ordering**: Show optional items first, then the next required item. Make it clear which is which.

+**Quick start tie-breaker**: Only quick-start the required next step, or the explicit downstream handoff from the State Ledger `Skill:` line when present. Optional items stay optional and should not displace the required handoff.
+
 ## Constraints

 - Present all output in `{communication_language}`
 - Recommend running each skill in a **fresh context window**
 - Match the user's tone — conversational when they're casual, structured when they want specifics
- If the active module is ambiguous, retrieve all meta rows remote sources to find relevant info also to help answer their question
+- If the active module is ambiguous, retrieve all relevant `_meta` rows and remote module documentation to ground the answer; if ambiguity remains after grounding, ask rather than guess
+
+## Discovery-Rigor Awareness
+
+When analyzing the user's situation, check for indicators that `bmad-discovery-rigor` should be recommended as a preamble.
+
+**Recommend discovery-rigor when:**
+
+- The task is complex, ambiguous, or high-stakes
+- The user says they are not sure where to start or that the problem is complicated
+- Multiple valid approaches exist and the right one is not obvious
+- The user is about to start a build-class workflow like `bmad-create-prd` or `bmad-create-architecture` without prior discovery
+- No Discovery Context artifact exists at the expected output location
+
+**Skip discovery-rigor when:**
+
+- The task is a quick fix, simple edit, or well-understood change
+- A Discovery Context already exists and is current
+- The user has already classified the problem and provided clear requirements
+- The user is mid-workflow and needs help continuing rather than restarting
+
+**Discovery Context detection**: Check `{output_folder}/discovery-context.md` first, then topic-slugged `discovery-context-*.md` artifacts when the current task appears to be part of a multi-run discovery flow. If multiple complete artifacts are plausible matches, ask rather than guess. If a matching artifact exists and contains `status: 'complete'`, treat discovery as already done and recommend the next downstream skill based on its classification and recommendation sections.
+
+## State Ledger Awareness
+
+If the conversation contains a State Ledger with canonical fields such as Problem, Class, Position, Counter, Findings, Open, Evidence, or Skill (or close equivalents such as Activity / Discovery Counter):
+
+- Read it to understand what has already been completed
+- Use the Position field to determine the current workflow position
+- Use the Open items to surface unresolved issues alongside recommendations
+- Treat the `Skill:` line as the preferred downstream handoff when present unless stronger workspace evidence shows it is stale
+- If the ledger references a Discovery Context, load it for grounding
+
+When a State Ledger is present, ground recommendations in it explicitly, for example: "Based on your current progress ([Position from ledger]), here's what comes next ..."
--- a/src/core-skills/bmad-memory-manager/SKILL.md
+++ b/src/core-skills/bmad-memory-manager/SKILL.md
@ -0,0 +1,60 @@
+---
+name: bmad-memory-manager
+description: "Manages BMAD state persistence safely. Use when the user requests memory operations or when BMAD skills need state persistence."
+---
+
+# Memory Manager
+
+## Overview
+
+This skill provides deterministic persist, recover, list, and clear operations for BMAD workflow state. Act as the memory clerk for BMAD skills: follow the routing tables in [workflow.md](workflow.md), prefer auditable storage, and never let a memory failure become the reason another skill stops working.
+
+Follow the instructions in [workflow.md](workflow.md).
+
+## Operating Rules
+
+- **Log, don't halt** — memory enhances recovery, but the consuming skill's primary artifact remains canonical.
+- **Keep storage human-readable** — sidecars stay in markdown with provenance headers.
+- **Protect sidecars** — `_bmad/_memory/` must be gitignored and never committed.
+- **Treat decision tables as canonical** — these operations are low-freedom and should not be improvised.
+
+## Consumer Pattern
+
+Paste this into a skill's step file(s), replacing `{this-skill-name}` with the canonical caller once at authoring time.
+
+**Scopes:** `session` = Copilot `/memories/session/`, auto-cleans at conversation end. `workspace` = `_bmad/_memory/{skill}-sidecar/`, durable across conversations.
+
+```markdown
+### Memory Checkpoint
+
+When this step completes successfully, invoke `bmad-memory-manager`:
+
+- **persist** | scope: session | key: state-ledger | caller: "{this-skill-name}"
+- Content: current frontmatter + completion summary
+
+If recovery is needed at step entry:
+
+- **recover** | scope: session | key: state-ledger | caller: "{this-skill-name}"
+
+For workspace persistence (learned patterns, preferences):
+
+- **persist** | scope: workspace | key: learned-patterns | caller: "{this-skill-name}"
+- Content: reusable insights from this run
+
+Other operations (see workflow.md for the full decision tables):
+
+- **list** | scope: workspace — enumerate all sidecars
+- **clear** | scope: workspace | target: "{this-skill-name}" — remove this skill's sidecars
+```
+
+`caller:` is mandatory for skill-to-skill operations and should be baked into the invoking skill rather than inferred at runtime. User-initiated `list` may omit it and will default to `memory-manager`.
+
+## Recovery
+
+If conversation context is compressed, re-read this file and [workflow.md](workflow.md). The ops log at `_bmad/_memory/_ops-log.md` is the audit surface for reconstructing what happened before compression.
+
+## On Activation
+
+- Identify the operation, scope, key, and caller shape before doing anything else.
+- Treat the decision tables in [workflow.md](workflow.md) as canonical.
+- Prefer the consuming skill's primary artifact over memory if the two ever conflict.
--- a/src/core-skills/bmad-memory-manager/bmad-skill-manifest.yaml
+++ b/src/core-skills/bmad-memory-manager/bmad-skill-manifest.yaml
@ -0,0 +1 @@
+type: skill
--- a/src/core-skills/bmad-memory-manager/resources/decision-matrix.csv
+++ b/src/core-skills/bmad-memory-manager/resources/decision-matrix.csv
@ -0,0 +1,6 @@
+trigger,scope,key_pattern,rationale
+Step completed in multi-step workflow,session,{skill}-state-ledger,Survives compression; auto-cleans at session end
+Context compression detected,session,{skill}-state-ledger,Reactive recovery — rebuild state from last checkpoint
+Learned a reusable pattern,workspace,{skill}-sidecar/learned-patterns,Persists across conversations for skill improvement
+User expressed a preference,workspace,{skill}-sidecar/user-preferences,Remembers user choices across sessions
+Workflow completed summary,workspace,{skill}-sidecar/run-summary,Post-mortem and audit trail for completed work
--- a/src/core-skills/bmad-memory-manager/workflow.md
+++ b/src/core-skills/bmad-memory-manager/workflow.md
@ -0,0 +1,139 @@
+---
+context_file: '' # Optional context file path for project-specific guidance
+---
+
+# Memory Manager Workflow
+
+This workflow routes `persist`, `recover`, `list`, and `clear` requests into the correct storage surface, verifies the result when verification matters, and logs operations whenever the log surface already exists or the operation is already writing sidecar state. Treat the decision tables below as low-freedom instructions: precision matters more than improvisation.
+
+---
+
+## Setup
+
+Before any write operation, ensure `_bmad/_memory/` exists at project root and is listed in `.gitignore`. Warn the user if it is not gitignored — sidecar data must never be committed. Read-only operations should report absence rather than materializing storage.
+
+## Request Shapes
+
+| Operation | Required fields | Notes |
+|-----------|-----------------|-------|
+| `persist` | `scope`, `key`, content, optional `caller:` | Writes session or workspace state |
+| `recover` | `scope`, `key`, optional `caller:` | Returns stored state or null |
+| `list` | optional `caller:` | Enumerates workspace sidecars |
+| `clear` | `scope`, optional `target`, optional `caller:` | Clears one sidecar or all workspace sidecars |
+
+### Naming Conventions
+
+| Context | Pattern | Example |
+|---------|---------|---------|
+| Session (Copilot tool) | `/memories/session/{caller}-{key}.md` | `/memories/session/discovery-rigor-state-ledger.md` |
+| Session fallback (sidecar) | `_bmad/_memory/{caller}-sidecar/{key}-{YYYY-MM-DD}.md` | `_bmad/_memory/discovery-rigor-sidecar/state-ledger-2026-03-24.md` |
+| Workspace | `_bmad/_memory/{caller}-sidecar/{key}.md` | `_bmad/_memory/discovery-rigor-sidecar/learned-patterns.md` |
+| Ops log | `_bmad/_memory/_ops-log.md` | (fixed path) |
+
+---
+
+## Caller Identity Resolution
+
+Determine the caller ID before every operation using this table:
+
+| Priority | Condition | Resolved Caller ID | Example |
+|----------|-----------|---------------------|---------|
+| 1 (highest) | `caller:` parameter provided | Value of `caller:` | `caller: "discovery-rigor"` → `discovery-rigor` |
+| 2 (default) | No `caller:` parameter | `memory-manager` | User says "list workspace memory" → `memory-manager` |
+
+Caller ID determines the sidecar directory name. `caller: "discovery-rigor"` → `_bmad/_memory/discovery-rigor-sidecar/`.
+
+---
+
+## Persist
+
+Ensure `_bmad/_memory/` directory exists. Resolve caller identity. Prepare the provenance header.
+
+| Row | Scope | Tool Available? | Verify Result | Action |
+|-----|-------|-----------------|---------------|--------|
+| P1 | session | yes | read-back matches | Write to `/memories/session/{caller}-{key}.md`. Log success. |
+| P2 | session | yes | read-back fails | Fall back to `_bmad/_memory/{caller}-sidecar/{key}-{YYYY-MM-DD}.md`. Log warning + fallback. |
+| P3 | session | no | n/a | Write to `_bmad/_memory/{caller}-sidecar/{key}-{YYYY-MM-DD}.md`. Log fallback. |
+| P4 | workspace | n/a | n/a | Write to `_bmad/_memory/{caller}-sidecar/{key}.md` (overwrite). Log success. |
+
+After the write completes:
+
+1. Add the provenance header to the written file: `<!-- bmad-memory | skill: {caller} | last-write: {YYYY-MM-DD} | purpose: {key} -->`
+2. Append an ops-log entry (see Ops Log section below).
+
+> **Key distinctions:**
+>
+> - P2/P3 append `-{YYYY-MM-DD}` (ISO date, sorts lexicographically). These are ephemeral snapshots.
+> - P4 has no date suffix — workspace files are overwrite-in-place canonical sidecars.
+> - P2 is the verify-fail branch: session write succeeded but read-back didn't match → demote to sidecar.
+> - One file per key per day for session fallback. A second persist on the same day overwrites the existing dated file.
+
+---
+
+## Recover
+
+Resolve caller identity. Determine scope from the invocation.
+
+| Row | Scope | Step 1 | Found? | Step 2 | Found? | Result |
+|-----|-------|--------|--------|--------|--------|--------|
+| R1 | session | Check `/memories/session/{caller}-{key}.md` | yes | — | — | Return content. Log success. |
+| R2 | session | Check `/memories/session/{caller}-{key}.md` | no | Check `_bmad/_memory/{caller}-sidecar/{key}-*.md` (alphabetically last = latest date) | yes | Return content. Log "recovered from fallback". |
+| R3 | session | Check `/memories/session/{caller}-{key}.md` | no | Check `_bmad/_memory/{caller}-sidecar/{key}-*.md` | no | Return null. Log "no state found". |
+| R4 | workspace | Check `_bmad/_memory/{caller}-sidecar/{key}.md` | yes | — | — | Return content. Log success. |
+| R5 | workspace | Check `_bmad/_memory/{caller}-sidecar/{key}.md` | no | — | — | Return null. Log "no state found". |
+
+After the check completes, append an ops-log entry only if `_bmad/_memory/_ops-log.md` already exists. Do not create `_bmad/_memory/` or `_ops-log.md` for a read-only recover.
+
+> **Key distinction for R2:** "Alphabetically last" works because the filename contract uses ISO 8601 dates (`{key}-YYYY-MM-DD.md`), which sort lexicographically. No date parsing required.
+
+---
+
+## Ops Log
+
+The ops log lives at `_bmad/_memory/_ops-log.md`. It is append-only. Write operations append a row after completion; read-only operations append only when the log surface already exists.
+
+**Format:**
+
+```markdown
+| Date | Skill | Operation | Scope | Key | Status |
+|------|-------|-----------|-------|-----|--------|
+| 2026-03-24 | discovery-rigor | persist | session | state-ledger | success |
+```
+
+**Rules:**
+
+- If `_ops-log.md` does not exist yet, create it with the header row first, then append the data row.
+- Append one row per operation when the log surface already exists. Read-only `recover` and `list` must not materialize `_bmad/_memory/` just to create an ops log.
+- Status values: `success`, `fallback`, `warning`, `no-state-found`, `cleared`.
+- The ops log is the primary debugging tool — log, don't halt.
+
+---
+
+## List
+
+Resolve caller identity. Check that `_bmad/_memory/` exists — if it does not, report `no memory directory found` and stop without appending an ops-log entry because the log surface does not exist yet.
+
+**Procedure:**
+
+1. Enumerate all `*-sidecar/` directories under `_bmad/_memory/`.
+2. For each sidecar directory, report:
+   - **Skill name** — directory name with the `-sidecar` suffix stripped.
+   - **File count** — number of files in the directory.
+   - **Last modified** — date of the newest file.
+3. Report `_ops-log.md` existence and row count (if it exists).
+4. Append an ops-log entry.
+
+---
+
+## Clear
+
+Resolve caller identity. Confirm the target scope.
+
+| Row | Mode | Input | Deletes | Ops Log |
+|-----|------|-------|---------|---------|
+| C1 | Granular | `clear | scope: workspace | target: "{skill}"` | `_bmad/_memory/{skill}-sidecar/` contents only | Append `cleared {skill}` |
+| C2 | Full | `clear | scope: workspace` | All `*-sidecar/` dirs + `_ops-log.md` | Log is deleted as part of clear |
+
+After clear, the next memory operation will recreate `_ops-log.md` with a fresh header (per the Ops Log section rules).
+
+> **Key distinction for C2:** The ops-log entry must be appended *before* deleting the ops log file. Order matters — C2 is the only operation where the log itself is destroyed. The entry serves as a "last known action" marker if the log is recreated later.
--- a/src/core-skills/bmad-review-adversarial-general/SKILL.md
+++ b/src/core-skills/bmad-review-adversarial-general/SKILL.md
@ -10,10 +10,10 @@ description: 'Perform a Cynical Review and produce a findings report. Use when t
 **Your Role:** You are a cynical, jaded reviewer with zero patience for sloppy work. The content was submitted by a clueless weasel and you expect to find problems. Be skeptical of everything. Look for what's missing, not just what's wrong. Use a precise, professional tone — no profanity or personal attacks.

 **Inputs:**
+
 - **content** — Content to review: diff, spec, story, doc, or any artifact
 - **also_consider** (optional) — Areas to keep in mind during review alongside normal adversarial analysis

-
 ## EXECUTION

 ### Step 1: Receive Content
@ -24,14 +24,64 @@ description: 'Perform a Cynical Review and produce a findings report. Use when t

 ### Step 2: Adversarial Analysis

-Review with extreme skepticism — assume problems exist. Find at least ten issues to fix or improve in the provided content.
+Review with extreme skepticism — assume problems may exist. Report every materially defensible issue to fix or improve in the provided content.

 ### Step 3: Present Findings

 Output findings as a Markdown list (descriptions only).

-
 ## HALT CONDITIONS

- HALT if zero findings — this is suspicious, re-analyze or ask for guidance
+- In standard adversarial mode, if zero findings appear on the first pass, re-analyze once. If the content still appears clean, report `no material adversarial findings` explicitly and note any residual review risk.
 - HALT if content is empty or unreadable
+
+---
+
+## COMPLETENESS REVIEW MODE
+
+_Use when the review should verify deliverables against a specification, Discovery Context, or acceptance criteria — not just find problems._
+
+**Trigger:** User says "check completeness", "verify against spec", or provides a specification alongside content to review.
+
+**Additional Inputs:**
+
+- **specification** — Discovery Context, PRD, story acceptance criteria, or any reference specification to verify against
+
+### Execution
+
+1. **Load specification** — If the specification is missing or unreadable, ask for it and halt. Otherwise read the specification document and extract all requirements, constraints, acceptance criteria, and non-goals.
+
+2. **Trace each requirement** — For each requirement in the specification, determine whether the content addresses it:
+   - ✅ **Met** — requirement is clearly addressed
+   - ⚠️ **Partial** — requirement is addressed but incompletely or ambiguously
+   - ❌ **Missing** — requirement is not addressed
+   - 🚫 **Contradicted** — content contradicts the requirement
+
+3. **Check for extras** — Identify anything in the content that was not in the specification:
+   - **Scope creep** — additions beyond what was specified
+   - **Undocumented decisions** — choices made without specification basis
+
+4. **Present completeness report:**
+
+   ```markdown
+   ## Completeness Review
+
+   **Specification:** [source document]
+   **Content reviewed:** [what was reviewed]
+
+   ### Requirement Traceability
+
+   | #   | Requirement   | Status      | Evidence           |
+   | --- | ------------- | ----------- | ------------------ |
+   | 1   | [requirement] | ✅/⚠️/❌/🚫 | [where in content] |
+
+   ### Summary
+
+   - Met: [count] | Partial: [count] | Missing: [count] | Contradicted: [count]
+
+   ### Unspecified Additions
+
+   - [any content not traceable to specification]
+   ```
+
+5. **Then run adversarial analysis for any additional material issues** — completeness review does not replace adversarial review, it precedes it. Zero additional adversarial findings are acceptable when the traceability review is clean.
--- a/src/core-skills/module-help.csv
+++ b/src/core-skills/module-help.csv
@ -10,3 +10,5 @@ Core,bmad-editorial-review-structure,Editorial Review - Structure,ES,Use when do
 Core,bmad-review-adversarial-general,Adversarial Review,AR,"Use for quality assurance or before finalizing deliverables. Code Review in other modules runs this automatically, but also useful for document reviews.",[path],anytime,,,false,,
 Core,bmad-review-edge-case-hunter,Edge Case Hunter Review,ECH,Use alongside adversarial review for orthogonal coverage — method-driven not attitude-driven.,[path],anytime,,,false,,
 Core,bmad-distillator,Distillator,DG,Use when you need token-efficient distillates that preserve all information for downstream LLM consumption.,[path],anytime,,,false,adjacent to source document or specified output_path,distillate markdown file(s)
+Core,bmad-discovery-rigor,Discovery Rigor,DR,Structured discovery protocol — classify problems, interview for gaps, sweep blind spots via System Reality Categories. Use before complex or high-stakes work.,,anytime,,,false,{output_folder}/discovery-context.md,discovery context document
+Core,bmad-memory-manager,Memory Manager,MEM,Standardized persistence facade for BMAD skills — persist recover list and clear operations across session and workspace scopes. Use when building skills that need state persistence or when managing memory.,,anytime,,,false,_bmad/_memory/,ops-log
--- a/test/test-installation-components.js
+++ b/test/test-installation-components.js
@ -1975,6 +1975,108 @@ async function runTests() {

  console.log('');

+  // ============================================================
+  // Suite 35: Agent manifest prunes stale entries on regeneration
+  // ============================================================
+  console.log(`${colors.yellow}Test Suite 35: Agent manifest prunes stale rows${colors.reset}\n`);
+
+  let tempFixture35;
+  try {
+    tempFixture35 = await fs.mkdtemp(path.join(os.tmpdir(), 'bmad-test-35-'));
+    const tempBmadDir35 = path.join(tempFixture35, '_bmad');
+    const configDir35 = path.join(tempBmadDir35, '_config');
+    const currentAgentDir35 = path.join(tempBmadDir35, 'bmm', '4-implementation', 'bmad-agent-dev');
+
+    await fs.ensureDir(configDir35);
+    await fs.ensureDir(currentAgentDir35);
+
+    await fs.writeFile(
+      path.join(configDir35, 'agent-manifest.csv'),
+      [
+        'name,displayName,title,icon,capabilities,role,identity,communicationStyle,principles,module,path,canonicalId',
+        '"qa","QA Engineer","","","","","","","","bmm","_bmad/bmm/agents/qa.md","bmad-qa"',
+        '',
+      ].join('\n'),
+    );
+
+    await fs.writeFile(
+      path.join(currentAgentDir35, 'SKILL.md'),
+      ['---', 'name: bmad-agent-dev', 'description: Developer agent.', '---', '', 'Use this agent skill directly.'].join('\n'),
+    );
+    await fs.writeFile(
+      path.join(currentAgentDir35, 'bmad-skill-manifest.yaml'),
+      ['type: agent', 'name: bmad-agent-dev', 'displayName: Developer Agent', 'module: bmm', 'canonicalId: bmad-agent-dev', ''].join('\n'),
+    );
+
+    const generator35 = new ManifestGenerator();
+    generator35.bmadDir = tempBmadDir35;
+    generator35.bmadFolderName = '_bmad';
+    generator35.updatedModules = ['bmm'];
+    generator35.files = [];
+
+    await generator35.collectSkills();
+    await generator35.collectAgents(['bmm']);
+    await generator35.writeAgentManifest(configDir35);
+
+    const agentManifest35 = await fs.readFile(path.join(configDir35, 'agent-manifest.csv'), 'utf8');
+
+    assert(!agentManifest35.includes('_bmad/bmm/agents/qa.md'), 'Agent manifest drops stale legacy rows');
+    assert(agentManifest35.includes('bmad-agent-dev'), 'Agent manifest keeps currently discovered agent rows');
+  } catch (error) {
+    assert(false, 'Agent manifest prune test succeeds', error.message);
+  } finally {
+    if (tempFixture35) await fs.remove(tempFixture35).catch(() => {});
+  }
+
+  console.log('');
+
+  // ============================================================
+  // Suite 36: Skill scanner ignores helper SKILL.md inside skill packages
+  // ============================================================
+  console.log(`${colors.yellow}Test Suite 36: Skill scanner skips nested helper SKILL.md files${colors.reset}\n`);
+
+  let tempFixture36;
+  const originalConsoleError36 = console.error;
+  try {
+    tempFixture36 = await fs.mkdtemp(path.join(os.tmpdir(), 'bmad-test-36-'));
+    const tempBmadDir36 = path.join(tempFixture36, '_bmad');
+    const rootSkillDir36 = path.join(tempBmadDir36, 'core', 'root-skill');
+    const helperAgentDir36 = path.join(rootSkillDir36, 'agents');
+    const consoleErrors36 = [];
+
+    await fs.ensureDir(helperAgentDir36);
+    await fs.writeFile(
+      path.join(rootSkillDir36, 'SKILL.md'),
+      ['---', 'name: root-skill', 'description: Root skill.', '---', '', 'Root skill body.'].join('\n'),
+    );
+    await fs.writeFile(
+      path.join(helperAgentDir36, 'SKILL.md'),
+      ['---', 'name: thinking-protocol', 'description: Helper agent.', '---', '', 'Helper agent body.'].join('\n'),
+    );
+
+    console.error = (...args) => {
+      consoleErrors36.push(args.join(' '));
+    };
+
+    const generator36 = new ManifestGenerator();
+    generator36.bmadDir = tempBmadDir36;
+    generator36.bmadFolderName = '_bmad';
+    generator36.updatedModules = ['core'];
+    generator36.files = [];
+
+    await generator36.collectSkills();
+
+    assert(generator36.skills.length === 1, 'Skill scanner keeps the top-level skill only');
+    assert(consoleErrors36.length === 0, 'Skill scanner does not warn on helper SKILL.md inside a claimed skill package');
+  } catch (error) {
+    assert(false, 'Nested helper skill scanner test succeeds', error.message);
+  } finally {
+    console.error = originalConsoleError36;
+    if (tempFixture36) await fs.remove(tempFixture36).catch(() => {});
+  }
+
+  console.log('');
+
  // ============================================================
  // Summary
  // ============================================================
--- a/tools/installer/core/manifest-generator.js
+++ b/tools/installer/core/manifest-generator.js
@ -191,6 +191,10 @@ class ManifestGenerator {
          if (debug) {
            console.log(`[DEBUG] collectSkills: claimed skill "${skillMeta.name}" as ${canonicalId} at ${dir}`);
          }
+
+          // Skill directories own their internal resources. Do not recurse into
+          // nested subdirectories looking for more SKILL.md entrypoints.
+          return;
        }

        // Recurse into subdirectories
@ -483,31 +487,12 @@ class ManifestGenerator {
    const csvPath = path.join(cfgDir, 'agent-manifest.csv');
    const escapeCsv = (value) => `"${String(value ?? '').replaceAll('"', '""')}"`;

-    // Read existing manifest to preserve entries
-    const existingEntries = new Map();
-    if (await fs.pathExists(csvPath)) {
-      const content = await fs.readFile(csvPath, 'utf8');
-      const records = csv.parse(content, {
-        columns: true,
-        skip_empty_lines: true,
-      });
-      for (const record of records) {
-        existingEntries.set(`${record.module}:${record.name}`, record);
-      }
-    }
-
    // Create CSV header with persona fields and canonicalId
    let csvContent = 'name,displayName,title,icon,capabilities,role,identity,communicationStyle,principles,module,path,canonicalId\n';

-    // Combine existing and new agents, preferring new data for duplicates
+    // The current install scan is authoritative. Reusing rows from a previous
+    // manifest leaves deleted agents behind after update/recompile.
    const allAgents = new Map();
-
-    // Add existing entries
-    for (const [key, value] of existingEntries) {
-      allAgents.set(key, value);
-    }
-
-    // Add/update new agents
    for (const agent of this.agents) {
      const key = `${agent.module}:${agent.name}`;
      allAgents.set(key, {
@ -527,7 +512,9 @@ class ManifestGenerator {
    }

    // Write all agents
-    for (const [, record] of allAgents) {
+    for (const record of [...allAgents.values()].sort((left, right) =>
+      `${left.module}:${left.name}`.localeCompare(`${right.module}:${right.name}`),
+    )) {
      const row = [
        escapeCsv(record.name),
        escapeCsv(record.displayName),
--- a/tools/validate-file-refs.js
+++ b/tools/validate-file-refs.js
@ -210,6 +210,9 @@ function extractYamlRefs(filePath, content) {
    if (typeof value !== 'string') return;
    if (!isResolvable(value)) return;

+    // Skip distillate provenance — sources[] entries are historical external references
+    if (/^sources\b/.test(keyPath)) return;
+
    const line = range ? offsetToLine(content, range[0]) : undefined;

    // Check for {project-root}/_bmad/ refs
@ -444,6 +447,13 @@ if (require.main === module) {
      totalRefs++;
      const resolved = resolveRef(ref);

+      // Skip references that resolve outside the project — these are external
+      // provenance refs (e.g., distillate sources) not internal structure refs.
+      if (resolved && !resolved.startsWith(PROJECT_ROOT)) {
+        ok.push({ ref, tag: 'OK-EXTERNAL' });
+        continue;
+      }
+
      if (resolved && !fs.existsSync(resolved)) {
        // Extensionless paths may be directory references or partial templates.
        // If the path has no extension, check whether it exists as a directory.
--- a/tools/validate-skills.js
+++ b/tools/validate-skills.js
@ -202,7 +202,12 @@ function discoverSkillDirs(rootDirs) {
      const skillMd = path.join(fullPath, 'SKILL.md');

      if (fs.existsSync(skillMd)) {
-        skillDirs.push(fullPath);
+        // Skip agent directories — they follow different naming conventions
+        const manifestPath = path.join(fullPath, 'bmad-skill-manifest.yaml');
+        const isAgent = fs.existsSync(manifestPath) && fs.readFileSync(manifestPath, 'utf8').match(/^type:\s*agent\b/m);
+        if (!isAgent) {
+          skillDirs.push(fullPath);
+        }
      }

      // Keep walking into subdirectories to find nested skills