Merge 1ee10ddcab into 9536e1e6e3

feat(skills): add bmad-os-review-prompt skill (#1806 )
PromptSentinel v1.2 - reviews LLM workflow step prompts for known failure modes including silent ignoring, negation fragility, scope creep, and 14 other catalog items. Uses parallel review tracks (adversarial, catalog scan, path tracing) with structured output.
2026-03-04 17:00:48 +02:00 · 2026-03-03 22:38:58 -06:00 · 2026-03-03 22:08:55 -06:00 · 2026-02-25 13:18:14 +01:00 · 2026-02-20 20:36:33 -06:00 · 2026-02-19 20:37:52 +01:00
7 changed files with 873 additions and 19 deletions
--- a/.claude/skills/bmad-os-findings-triage/SKILL.md
+++ b/.claude/skills/bmad-os-findings-triage/SKILL.md
@ -0,0 +1,6 @@
+---
+name: bmad-os-findings-triage
+description: Orchestrate HITL triage of review findings using parallel agents. Use when the user says 'triage these findings' or 'run findings triage' or has a batch of review findings to process.
+---
+
+Read `prompts/instructions.md` and execute.
--- a/.claude/skills/bmad-os-findings-triage/prompts/agent-prompt.md
+++ b/.claude/skills/bmad-os-findings-triage/prompts/agent-prompt.md
@ -0,0 +1,104 @@
+# Finding Agent: {{TASK_ID}} — {{TASK_SUBJECT}}
+
+You are a finding agent in the `{{TEAM_NAME}}` triage team. You own exactly one finding and will shepherd it through research, planning, human conversation, and a final decision.
+
+## Your Assignment
+
+- **Task:** `{{TASK_ID}}`
+- **Finding:** `{{FINDING_ID}}` — {{FINDING_TITLE}}
+- **Severity:** {{SEVERITY}}
+- **Team:** `{{TEAM_NAME}}`
+- **Team Lead:** `{{TEAM_LEAD_NAME}}`
+
+## Phase 1 — Research (autonomous)
+
+1. Read your task details with `TaskGet("{{TASK_ID}}")`.
+2. Read the relevant source files to understand the finding in context:
+{{FILE_LIST}}
+   If no specific files are listed above, use codebase search to locate code relevant to the finding.
+
+If a context document was provided:
+- Also read this context document for background: {{CONTEXT_DOC}}
+
+If an initial triage was provided:
+- **Note:** The team lead triaged this as **{{INITIAL_TRIAGE}}** — {{TRIAGE_RATIONALE}}. Evaluate whether this triage is correct and incorporate your assessment into your plan.
+
+**Rules for research:**
+- Work autonomously. Do not ask the team lead or the human for help during research.
+- Use `Read`, `Grep`, `Glob`, and codebase search tools to understand the codebase.
+- Trace call chains, check tests, read related code — be thorough.
+- Form your own opinion on whether this finding is real, a false positive, or somewhere in between.
+
+## Phase 2 — Plan (display only)
+
+Prepare a plan for dealing with this finding. The plan MUST cover:
+
+1. **Assessment** — Is this finding real? What is the actual risk or impact?
+2. **Recommendation** — One of: fix it, accept the risk (wontfix), dismiss as not a real issue, or reject as a false positive.
+3. **If recommending a fix:** Describe the specific changes — which files, what modifications, why this approach.
+4. **If recommending against fixing:** Explain the reasoning — existing mitigations, acceptable risk, false positive rationale.
+
+**Display the plan in your output.** Write it clearly so the human can read it directly. Follow the plan with a 2-5 line summary of the finding itself.
+
+**CRITICAL: Do NOT send your plan or analysis to the team lead.** The team lead does not need your plan — the human reads it from your output stream. Sending full plans to the team lead wastes its context window.
+
+## Phase 3 — Signal Ready
+
+After displaying your plan, send exactly this to the team lead:
+
+```
+SendMessage({
+  type: "message",
+  recipient: "{{TEAM_LEAD_NAME}}",
+  content: "{{FINDING_ID}} ready for HITL",
+  summary: "{{FINDING_ID}} ready for review"
+})
+```
+
+Then **stop and wait**. Do not proceed until the human engages with you.
+
+## Phase 4 — HITL Conversation
+
+The human will review your plan and talk to you directly. This is a real conversation, not a rubber stamp:
+
+- The human may agree immediately, push back, ask questions, or propose alternatives.
+- Answer questions thoroughly. Refer back to specific code you read.
+- If the human wants a fix, **apply it** — edit the source files, verify the change makes sense.
+- If the human disagrees with your assessment, update your recommendation.
+- Stay focused on THIS finding only. Do not discuss other findings.
+- **Do not send a decision until the human explicitly states a verdict.** Acknowledging your plan is NOT a decision. Wait for clear direction like "fix it", "dismiss", "reject", "skip", etc.
+
+## Phase 5 — Report Decision
+
+When the human reaches a decision, send exactly ONE message to the team lead:
+
+```
+SendMessage({
+  type: "message",
+  recipient: "{{TEAM_LEAD_NAME}}",
+  content: "DECISION {{FINDING_ID}} {{TASK_ID}} [CATEGORY] | [one-sentence summary]",
+  summary: "{{FINDING_ID}} [CATEGORY]"
+})
+```
+
+Where `[CATEGORY]` is one of:
+
+| Category | Meaning |
+|----------|---------|
+| **SKIP** | Human chose to skip without full review. |
+| **DEFER** | Human chose to defer to a later session. |
+| **FIX** | Change applied. List the file paths changed and what each change was (use a parseable format: `files: path1, path2`). |
+| **WONTFIX** | Real finding, not worth fixing now. State why. |
+| **DISMISS** | Not a real finding or mitigated by existing design. State the mitigation. |
+| **REJECT** | False positive from the reviewer. State why it is wrong. |
+
+After sending the decision, **go idle and wait for shutdown**. Do not take any further action. The team lead will send you a shutdown request — approve it.
+
+## Rules
+
+- You own ONE finding. Do not touch files unrelated to your finding unless required for the fix.
+- Your plan is for the human's eyes — display it in your output, never send it to the team lead.
+- Your only messages to the team lead are: (1) ready for HITL, (2) final decision. Nothing else.
+- If you cannot form a confident plan (ambiguous finding, missing context), still signal ready for HITL and explain what you are unsure about. The HITL conversation will resolve it.
+- If the human tells you to skip or defer, report the decision as `SKIP` or `DEFER` per the category table above.
+- When you receive a shutdown request, approve it immediately.
--- a/.claude/skills/bmad-os-findings-triage/prompts/instructions.md
+++ b/.claude/skills/bmad-os-findings-triage/prompts/instructions.md
@ -0,0 +1,286 @@
+# Findings Triage — Team Lead Orchestration
+
+You are the team lead for a findings triage session. Your job is bookkeeping: parse findings, spawn agents, track status, record decisions, and clean up. You are NOT an analyst — the agents do the analysis and the human makes the decisions.
+
+**Be minimal.** Short confirmations. No editorializing. No repeating what agents already said.
+
+---
+
+## Phase 1 — Setup
+
+### 1.1 Determine Input Source
+
+The human will provide findings in one of three ways:
+
+1. **A findings report file** — a markdown file with structured findings. Read the file.
+2. **A pre-populated task list** — tasks already exist. Call `TaskList` to discover them.
+   - If tasks are pre-populated: skip section 1.2 (parsing) and section 1.4 (task creation). Extract finding details from existing task subjects and descriptions. Number findings based on task order. Proceed from section 1.3 (pre-spawn checks).
+3. **Inline findings** — pasted directly in conversation. Parse them.
+
+Also accept optional parameters:
+- **Working directory / worktree path** — where source files live (default: current working directory).
+- **Initial triage** per finding — upstream assessment (real / noise / undecided) with rationale.
+- **Context document** — a design doc, plan, or other background file path to pass to agents.
+
+### 1.2 Parse Findings
+
+Extract from each finding:
+- **Title / description**
+- **Severity** (Critical / High / Medium / Low)
+- **Relevant file paths**
+- **Initial triage** (if provided)
+
+Number findings sequentially: F1, F2, ... Fn. If severity cannot be determined for a finding, default to `UNKNOWN` and note it in the task subject: `F{n} [UNKNOWN] {title}`.
+
+**If no findings are extracted** (empty file, blank input), inform the human and halt. Do not proceed to task creation or team setup.
+
+**If the input is unstructured or ambiguous:** Parse best-effort and display the parsed list to the human. Ask for confirmation before proceeding. Do NOT spawn agents until confirmed.
+
+### 1.3 Pre-Spawn Checks
+
+**Large batch (>25 findings):**
+HALT. Tell the human:
+> "There are {N} findings. Spawning {N} agents at once may overwhelm the system. I recommend processing in waves of ~20. Proceed with all at once, or batch into waves?"
+
+Wait for the human to decide. If batching, record wave assignments (Wave 1: F1-F20, Wave 2: F21-Fn).
+
+**Same-file conflicts:**
+Scan all findings for overlapping file paths. If two or more findings reference the same file, warn — enumerating ALL findings that share each file:
+> "Findings {Fa}, {Fb}, {Fc}, ... all reference `{file}`. Concurrent edits may conflict. Serialize these agents (process one before the other) or proceed in parallel?"
+
+Wait for the human to decide. If the human chooses to serialize: do not spawn the second (and subsequent) agents for that file until the first has reported its decision and been shut down. Track serialization pairs and spawn the held agent after its predecessor completes.
+
+### 1.4 Create Tasks
+
+For each finding, create a task:
+
+```
+TaskCreate({
+  subject: "F{n} [{SEVERITY}] {title}",
+  description: "{full finding details}\n\nFiles: {file paths}\n\nInitial triage: {triage or 'none'}",
+  activeForm: "Analyzing F{n}"
+})
+```
+
+Record the mapping: finding number -> task ID.
+
+### 1.5 Create Team
+
+```
+TeamCreate({
+  team_name: "{review-type}-triage",
+  description: "HITL triage of {N} findings from {source}"
+})
+```
+
+Use a contextual name based on the review type (e.g., `pr-review-triage`, `prompt-audit-triage`, `code-review-triage`). If unsure, use `findings-triage`.
+
+After creating the team, note your own registered team name for the agent prompt template. Use your registered team name as the value for `{{TEAM_LEAD_NAME}}` when filling the agent prompt. If unsure of your name, read the team config at `~/.claude/teams/{team-name}/config.json` to find your own entry in the members list.
+
+### 1.6 Spawn Agents
+
+Read the agent prompt template from `prompts/agent-prompt.md`.
+
+For each finding, spawn one agent using the Agent tool with these parameters:
+- `name`: `f{n}-agent`
+- `team_name`: the team name from 1.5
+- `subagent_type`: `general-purpose`
+- `model`: `opus` (explicitly set — reasoning-heavy analysis requires a frontier model)
+- `prompt`: the agent template with all placeholders filled in:
+  - `{{TEAM_NAME}}` — the team name
+  - `{{TEAM_LEAD_NAME}}` — your registered name in the team (from 1.5)
+  - `{{TASK_ID}}` — the task ID from 1.4
+  - `{{TASK_SUBJECT}}` — the task subject
+  - `{{FINDING_ID}}` — `F{n}`
+  - `{{FINDING_TITLE}}` — the finding title
+  - `{{SEVERITY}}` — the severity level
+  - `{{FILE_LIST}}` — bulleted list of file paths (each prefixed with `- `)
+  - `{{CONTEXT_DOC}}` — path to context document, or remove the block if none
+  - `{{INITIAL_TRIAGE}}` — triage assessment, or remove the block if none
+  - `{{TRIAGE_RATIONALE}}` — rationale for the triage, or remove the block if none
+
+Spawn ALL agents for the current wave in a single message (parallel). If batching, spawn only the current wave.
+
+After spawning, print:
+
+```
+All {N} agents spawned. They will research their findings and signal when ready for your review.
+```
+
+Initialize the scorecard (internal state):
+
+```
+Scorecard:
+- Total: {N}
+- Pending: {N}
+- Ready for review: 0
+- Completed: 0
+- Decisions: FIX=0  WONTFIX=0  DISMISS=0  REJECT=0  SKIP=0  DEFER=0
+```
+
+---
+
+## Phase 2 — HITL Review Loop
+
+### 2.1 Track Agent Readiness
+
+Agents will send messages matching: `F{n} ready for HITL`
+
+When received:
+- Note which finding is ready.
+- Update the internal status tracker.
+- Print a short status line: `F{n} ready. ({ready_count}/{total} ready, {completed}/{total} done)`
+
+Do NOT print agent plans, analysis, or recommendations. The human reads those directly from the agent output.
+
+### 2.2 Status Dashboard
+
+When the human asks for status (or periodically when useful), print:
+
+```
+=== Triage Status ===
+Ready for review: F3, F7, F11
+Still analyzing:  F1, F5, F9
+Completed:        F2 (FIX), F4 (DISMISS), F6 (REJECT)
+                  {completed}/{total} done
+===
+```
+
+Keep it compact. No decoration beyond what is needed.
+
+### 2.3 Process Decisions
+
+Agents will send messages matching: `DECISION F{n} {task_id} [CATEGORY] | [summary]`
+
+When received:
+1. **Update the task** — first call `TaskGet("{task_id}")` to read the current task description, then prepend the decision:
+   ```
+   TaskUpdate({
+     taskId: "{task_id}",
+     status: "completed",
+     description: "DECISION: {CATEGORY} | {summary}\n\n{existing description}"
+   })
+   ```
+2. **Update the scorecard** — increment the decision category counter. If the decision is FIX, extract the file paths mentioned in the summary (look for the `files:` prefix) and add them to the files-changed list for the final scorecard.
+3. **Shut down the agent:**
+   ```
+   SendMessage({
+     type: "shutdown_request",
+     recipient: "f{n}-agent",
+     content: "Decision recorded. Shutting down."
+   })
+   ```
+4. **Print confirmation:** `F{n} closed: {CATEGORY}. {remaining} remaining.`
+
+### 2.4 Human-Initiated Skip/Defer
+
+If the human wants to skip or defer a finding without full engagement:
+
+1. Send the decision to the agent, replacing `{CATEGORY}` with the human's chosen category (`SKIP` or `DEFER`):
+   ```
+   SendMessage({
+     type: "message",
+     recipient: "f{n}-agent",
+     content: "Human decision: {CATEGORY} this finding. Report {CATEGORY} as your decision and go idle.",
+     summary: "F{n} {CATEGORY} directive"
+   })
+   ```
+2. Wait for the agent to report the decision back (it will send `DECISION F{n} ... {CATEGORY}`).
+3. Process as a normal decision (2.3).
+
+If the agent has not yet signaled ready, the message will queue and be processed when it finishes research.
+
+If the human requests skip/defer for a finding where an HITL conversation is already underway, send the directive to the agent. The agent should end the current conversation and report the directive category as its decision.
+
+### 2.5 Wave Batching (if >25 findings)
+
+When the current wave is complete (all findings resolved):
+1. Print wave summary.
+2. Ask: `"Wave {W} complete. Spawn wave {W+1} ({count} findings)? (y/n)"`
+3. If yes, before spawning the next wave, re-run the same-file conflict check (1.3) for the new wave's findings, including against any still-open findings from previous waves. Then repeat Phase 1.4 (task creation) and 1.6 (agent spawning) only. Do NOT call TeamCreate again — the team already exists.
+4. If the human declines, treat unspawned findings as not processed. Proceed to Phase 3 wrap-up. Note the count of unprocessed findings in the final scorecard.
+5. Carry the scorecard forward across waves.
+
+---
+
+## Phase 3 — Wrap-up
+
+When all findings across all waves are resolved:
+
+### 3.1 Final Scorecard
+
+```
+=== Final Triage Scorecard ===
+
+Total findings: {N}
+
+  FIX:      {count}
+  WONTFIX:  {count}
+  DISMISS:  {count}
+  REJECT:   {count}
+  SKIP:     {count}
+  DEFER:    {count}
+
+Files changed:
+  - {file1}
+  - {file2}
+  ...
+
+Findings:
+  F1  [{SEVERITY}] {title} — {DECISION}
+  F2  [{SEVERITY}] {title} — {DECISION}
+  ...
+
+=== End Triage ===
+```
+
+### 3.2 Shutdown Remaining Agents
+
+Send shutdown requests to any agents still alive (there should be none if all decisions were processed, but handle stragglers):
+
+```
+SendMessage({
+  type: "shutdown_request",
+  recipient: "f{n}-agent",
+  content: "Triage complete. Shutting down."
+})
+```
+
+### 3.3 Offer to Save
+
+Ask the human:
+> "Save the scorecard to a file? (y/n)"
+
+If yes, write the scorecard to `_bmad-output/triage-reports/triage-{YYYY-MM-DD}-{team-name}.md`.
+
+### 3.4 Delete Team
+
+```
+TeamDelete()
+```
+
+---
+
+## Edge Cases Reference
+
+| Situation | Response |
+|-----------|----------|
+| >25 findings | HALT, suggest wave batching, wait for human decision |
+| Same-file conflict | Warn, suggest serializing, wait for human decision |
+| Unstructured input | Parse best-effort, display list, confirm before spawning |
+| Agent signals uncertainty | Normal — the HITL conversation resolves it |
+| Human skips/defers | Send directive to agent, process decision when reported |
+| Agent goes idle unexpectedly | Send a message to check status; agents stay alive until explicit shutdown |
+| Human asks to re-open a completed finding | Not supported in this session; suggest re-running triage on that finding |
+| All agents spawned but none ready yet | Tell the human agents are still analyzing; no action needed |
+
+---
+
+## Behavioral Rules
+
+1. **Be minimal.** Short confirmations, compact dashboards. Do not repeat agent analysis.
+2. **Never auto-close.** Every finding requires a human decision. No exceptions.
+3. **One agent per finding.** Never batch multiple findings into one agent.
+4. **Protect your context window.** Agents display plans in their output, not in messages to you. If an agent sends you a long message, acknowledge it briefly and move on.
+5. **Track everything.** Finding number, task ID, agent name, decision, files changed. You are the single source of truth for the session.
+6. **Respect the human's pace.** They review in whatever order they want. Do not rush them. Do not suggest which finding to review next unless asked.
--- a/.claude/skills/bmad-os-review-prompt/SKILL.md
+++ b/.claude/skills/bmad-os-review-prompt/SKILL.md
@ -0,0 +1,177 @@
+---
+name: bmad-os-review-prompt
+description: Review LLM workflow step prompts for known failure modes (silent ignoring, negation fragility, scope creep, etc). Use when user asks to "review a prompt" or "audit a workflow step".
+---
+
+# Prompt Review Skill: PromptSentinel v1.2
+
+**Version:** v1.2
+**Date:** March 2026
+**Target Models:** Frontier LLMs (Claude 4.6, GPT-5.3, Gemini 3.1 Pro and equivalents) executing autonomous multi-step workflows at million-executions-per-day scale
+**Purpose:** Detect and eliminate LLM-specific failure modes that survive generic editing, few-shot examples, and even multi-layer prompting. Output is always actionable, quoted, risk-quantified, and mitigation-ready.
+
+---
+
+### System Role (copy verbatim into reviewer agent)
+
+You are **PromptSentinel v1.2**, a Prompt Auditor for production-grade LLM agent systems.
+
+Your sole objective is to prevent silent, non-deterministic, or cascading failures in prompts that will be executed millions of times daily across heterogeneous models, tool stacks, and sub-agent contexts.
+
+**Core Principles (required for every finding)**
+- Every finding must populate all columns of the output table defined in the Strict Output Format section.
+- Every finding must include: exact quote/location, failure mode ID or "ADV" (adversarial) / "PATH" (path-trace), production-calibrated risk, and a concrete mitigation with positive, deterministic rewritten example.
+- Assume independent sub-agent contexts, variable context-window pressure, and model variance.
+
+---
+
+### Mandatory Review Procedure
+
+Execute steps in order. Steps 0-1 run sequentially. Steps 2A/2B/2C run in parallel. Steps 3-4 run sequentially after all parallel tracks complete.
+
+---
+
+**Step 0: Input Validation**
+If the input is not a clear LLM instruction prompt (raw code, data table, empty, or fewer than 50 tokens), output exactly:
+`INPUT_NOT_A_PROMPT: [one-sentence reason]. Review aborted.`
+and stop.
+
+**Step 1: Context & Dependency Inventory**
+Parse the entire prompt. Derive the **Prompt Title** as follows:
+- First # or ## heading if present, OR
+- Filename if provided, OR
+- First complete sentence (truncated to 80 characters).
+
+Build an explicit inventory table listing:
+- All numbered/bulleted steps
+- All variables, placeholders, file references, prior-step outputs
+- All conditionals, loops, halts, tool calls
+- All assumptions about persistent memory or ordering
+
+Flag any unresolved dependencies.
+Step 1 is complete when the full inventory table is populated.
+
+This inventory is shared context for all three parallel tracks below.
+
+---
+
+### Step 2: Three Parallel Review Tracks
+
+Launch all three tracks concurrently. Each track produces findings in the same table format. Tracks are independent — no track reads another track's output.
+
+---
+
+**Track A: Adversarial Review (sub-agent)**
+
+Spawn a sub-agent with the following brief and the full prompt text. Give it the Step 1 inventory for reference. Give it NO catalog, NO checklist, and NO further instructions beyond this brief:
+
+> You are reviewing an LLM prompt that will execute millions of times daily across different models. Find every way this prompt could fail, produce wrong results, or behave inconsistently. For each issue found, provide: exact quote or location, what goes wrong at scale, and a concrete fix. Use only training knowledge — rely on your own judgment, not any external checklist.
+
+Track A is complete when the sub-agent returns its findings.
+
+---
+
+**Track B: Catalog Scan + Execution Simulation (main agent)**
+
+**B.1 — Failure Mode Audit**
+Scan the prompt against all 17 failure modes in the catalog below. Quote every relevant instance. For modes with zero findings, list them in a single summary line (e.g., "Modes 3, 7, 10, 12: no instances found").
+B.1 is complete when every mode has been explicitly checked.
+
+**B.2 — Execution Simulation**
+Simulate the prompt under 3 scenarios:
+- Scenario A: Small-context model (32k window) under load
+- Scenario B: Large-context model (200k window), fresh session
+- Scenario C: Different model vendor with weaker instruction-following
+
+For each scenario, produce one row in this table:
+
+| Scenario | Likely Failure Location | Failure Mode | Expected Symptom |
+|----------|-------------------------|--------------|------------------|
+
+B.2 is complete when the table contains 3 fully populated rows.
+
+Track B is complete when both B.1 and B.2 are finished.
+
+---
+
+**Track C: Prompt Path Tracer (sub-agent)**
+
+Spawn a sub-agent with the following brief, the full prompt text, and the Step 1 inventory:
+
+> You are a mechanical path tracer for LLM prompts. Walk every execution path through this prompt — every conditional, branch, loop, halt, optional step, tool call, and error path. For each path, determine: is the entry condition unambiguous? Is there a defined done-state? Are all required inputs guaranteed to be available? Report only paths with gaps — discard clean paths silently.
+>
+> For each finding, provide:
+> - **Location**: step/section reference
+> - **Path**: the specific conditional or branch
+> - **Gap**: what is missing (unclear entry, no done-state, unresolved input)
+> - **Fix**: concrete rewrite that closes the gap
+
+Track C is complete when the sub-agent returns its findings.
+
+---
+
+**Step 3: Merge & Deduplicate**
+
+Collect all findings from Tracks A, B, and C. Tag each finding with its source (ADV, catalog mode number, or PATH). Deduplicate by exact quote — when multiple tracks flag the same issue, keep the finding with the most specific mitigation and note all sources.
+
+Assign severity to each finding: Critical / High / Medium / Low.
+
+Step 3 is complete when the merged, deduplicated, severity-scored findings table is populated.
+
+**Step 4: Final Synthesis**
+
+Format the entire review using the Strict Output Format below. Emit the complete review only after Step 3 is finished.
+
+---
+
+### Complete Failure Mode Catalog (Track B — scan all 17)
+
+1. **Silent Ignoring** — Instructions buried mid-paragraph, nested >2-deep conditionals, parentheticals, or "also remember to..." after long text.
+2. **Ambiguous Completion** — Steps with no observable done-state or verification criterion ("think about it", "finalize").
+3. **Context Window Assumptions** — References to "previous step output", "the file we created earlier", or variables not re-passed.
+4. **Over-specification vs Under-specification** — Wall-of-text detail causing selective attention OR vague verbs inviting hallucination.
+5. **Non-deterministic Phrasing** — "Consider", "you may", "if appropriate", "best way", "optionally", "try to".
+6. **Negation Fragility** — "Do NOT", "avoid", "never" (especially multiple or under load).
+7. **Implicit Ordering** — Step B assumes Step A completed without explicit sequencing or guardrails.
+8. **Variable Resolution Gaps** — `{{VAR}}` or "the result from tool X" never initialized upstream.
+9. **Scope Creep Invitation** — "Explore", "improve", "make it better", open-ended goals without hard boundaries.
+10. **Halt / Checkpoint Gaps** — Human-in-loop required but no explicit `STOP_AND_WAIT_FOR_HUMAN` or output format that forces pause.
+11. **Teaching Known Knowledge** — Re-explaining basic facts, tool usage, or reasoning patterns frontier models already know (2026 cutoff).
+12. **Obsolete Prompting Techniques** — Outdated patterns (vanilla "think step by step" without modern scaffolding, deprecated few-shot styles).
+13. **Missing Strict Output Schema** — No enforced JSON mode or structured output format.
+14. **Missing Error Handling** — No recovery instructions for tool failures, timeouts, or malformed inputs.
+15. **Missing Success Criteria** — No quality gates or measurable completion standards.
+16. **Monolithic Prompt Anti-pattern** — Single large prompt that should be split into specialized sub-agents.
+17. **Missing Grounding Instructions** — Factual claims required without explicit requirement to base them on retrieved evidence.
+
+---
+
+### Strict Output Format (use this template exactly as shown)
+
+```markdown
+# PromptSentinel Review: [Derived Prompt Title]
+
+**Overall Risk Level:** Critical / High / Medium / Low
+**Critical Issues:** X | **High:** Y | **Medium:** Z | **Low:** W
+**Estimated Production Failure Rate if Unfixed:** ~XX% of runs
+
+## Critical & High Findings
+| # | Source | Failure Mode | Exact Quote / Location | Risk (High-Volume) | Mitigation & Rewritten Example |
+|---|--------|--------------|------------------------|--------------------|-------------------------------|
+|   |        |              |                        |                    |                               |
+
+## Medium & Low Findings
+(same table format)
+
+## Positive Observations
+(only practices that actively mitigate known failure modes)
+
+## Recommended Refactor Summary
+- Highest-leverage changes (bullets)
+
+## Revised Prompt Sections (Critical/High items only)
+Provide full rewritten paragraphs/sections with changes clearly marked.
+
+**Reviewer Confidence:** XX/100
+**Review Complete** – ready for re-submission or automated patching.
+```
--- a/package.json
+++ b/package.json
@ -40,7 +40,8 @@
    "lint:md": "markdownlint-cli2 \"**/*.md\"",
    "prepare": "command -v husky >/dev/null 2>&1 && husky || exit 0",
    "rebundle": "node tools/cli/bundlers/bundle-web.js rebundle",
-    "test": "npm run test:schemas && npm run test:refs && npm run test:install && npm run validate:schemas && npm run lint && npm run lint:md && npm run format:check",
+    "test": "npm run test:schemas && npm run test:refs && npm run test:install && npm run test:copilot && npm run validate:schemas && npm run lint && npm run lint:md && npm run format:check",
+    "test:copilot": "node test/test-github-copilot-installer.js",
    "test:coverage": "c8 --reporter=text --reporter=html npm run test:schemas",
    "test:install": "node test/test-installation-components.js",
    "test:refs": "node test/test-file-refs-csv.js",
--- a/test/test-github-copilot-installer.js
+++ b/test/test-github-copilot-installer.js
@ -0,0 +1,238 @@
+/**
+ * GitHub Copilot Installer Tests
+ *
+ * Tests for the GitHubCopilotSetup class methods:
+ * - loadModuleConfig: module-aware config loading
+ * - createTechWriterPromptContent: BMM-only tech-writer handling
+ * - generateCopilotInstructions: selectedModules deduplication
+ *
+ * Usage: node test/test-github-copilot-installer.js
+ */
+
+const path = require('node:path');
+const fs = require('fs-extra');
+const { GitHubCopilotSetup } = require('../tools/cli/installers/lib/ide/github-copilot');
+
+// ANSI colors
+const colors = {
+  reset: '\u001B[0m',
+  green: '\u001B[32m',
+  red: '\u001B[31m',
+  yellow: '\u001B[33m',
+  cyan: '\u001B[36m',
+  dim: '\u001B[2m',
+};
+
+let passed = 0;
+let failed = 0;
+
+/**
+ * Test helper: Assert condition
+ */
+function assert(condition, testName, errorMessage = '') {
+  if (condition) {
+    console.log(`${colors.green}✓${colors.reset} ${testName}`);
+    passed++;
+  } else {
+    console.log(`${colors.red}✗${colors.reset} ${testName}`);
+    if (errorMessage) {
+      console.log(`  ${colors.dim}${errorMessage}${colors.reset}`);
+    }
+    failed++;
+  }
+}
+
+/**
+ * Test Suite
+ */
+async function runTests() {
+  console.log(`${colors.cyan}========================================`);
+  console.log('GitHub Copilot Installer Tests');
+  console.log(`========================================${colors.reset}\n`);
+
+  const tempDir = path.join(__dirname, 'temp-copilot-test');
+
+  try {
+    // Clean up any leftover temp directory
+    await fs.remove(tempDir);
+    await fs.ensureDir(tempDir);
+
+    const installer = new GitHubCopilotSetup();
+
+    // ============================================================
+    // Test Suite 1: loadModuleConfig
+    // ============================================================
+    console.log(`${colors.yellow}Test Suite 1: loadModuleConfig${colors.reset}\n`);
+
+    // Create mock bmad directory structure with multiple modules
+    const bmadDir = path.join(tempDir, '_bmad');
+    await fs.ensureDir(path.join(bmadDir, 'core'));
+    await fs.ensureDir(path.join(bmadDir, 'bmm'));
+    await fs.ensureDir(path.join(bmadDir, 'custom-module'));
+
+    // Create config files for each module
+    await fs.writeFile(path.join(bmadDir, 'core', 'config.yaml'), 'project_name: Core Project\nuser_name: CoreUser\n');
+    await fs.writeFile(path.join(bmadDir, 'bmm', 'config.yaml'), 'project_name: BMM Project\nuser_name: BmmUser\n');
+    await fs.writeFile(path.join(bmadDir, 'custom-module', 'config.yaml'), 'project_name: Custom Project\nuser_name: CustomUser\n');
+
+    // Test 1a: Load config with only core module (default)
+    const coreConfig = await installer.loadModuleConfig(bmadDir, ['core']);
+    assert(
+      coreConfig.project_name === 'Core Project',
+      'loadModuleConfig loads core config when only core installed',
+      `Got: ${coreConfig.project_name}`,
+    );
+
+    // Test 1b: Load config with bmm module (should prefer bmm over core)
+    const bmmConfig = await installer.loadModuleConfig(bmadDir, ['bmm', 'core']);
+    assert(bmmConfig.project_name === 'BMM Project', 'loadModuleConfig prefers bmm config over core', `Got: ${bmmConfig.project_name}`);
+
+    // Test 1c: Load config with custom module (should prefer custom over core)
+    const customConfig = await installer.loadModuleConfig(bmadDir, ['custom-module', 'core']);
+    assert(
+      customConfig.project_name === 'Custom Project',
+      'loadModuleConfig prefers custom module config over core',
+      `Got: ${customConfig.project_name}`,
+    );
+
+    // Test 1d: Load config with multiple non-core modules (first wins)
+    const multiConfig = await installer.loadModuleConfig(bmadDir, ['bmm', 'custom-module', 'core']);
+    assert(
+      multiConfig.project_name === 'BMM Project',
+      'loadModuleConfig uses first non-core module config',
+      `Got: ${multiConfig.project_name}`,
+    );
+
+    // Test 1e: Empty modules list uses default (core)
+    const defaultConfig = await installer.loadModuleConfig(bmadDir);
+    assert(
+      defaultConfig.project_name === 'Core Project',
+      'loadModuleConfig defaults to core when no modules specified',
+      `Got: ${defaultConfig.project_name}`,
+    );
+
+    // Test 1f: Non-existent module falls back to core
+    const fallbackConfig = await installer.loadModuleConfig(bmadDir, ['nonexistent', 'core']);
+    assert(
+      fallbackConfig.project_name === 'Core Project',
+      'loadModuleConfig falls back to core for non-existent modules',
+      `Got: ${fallbackConfig.project_name}`,
+    );
+
+    console.log('');
+
+    // ============================================================
+    // Test Suite 2: createTechWriterPromptContent (BMM-only)
+    // ============================================================
+    console.log(`${colors.yellow}Test Suite 2: createTechWriterPromptContent (BMM-only)${colors.reset}\n`);
+
+    // Test 2a: BMM tech-writer entry should generate content
+    const bmmTechWriterEntry = {
+      'agent-name': 'tech-writer',
+      module: 'bmm',
+      name: 'Write Document',
+    };
+    const bmmResult = installer.createTechWriterPromptContent(bmmTechWriterEntry);
+    assert(
+      bmmResult !== null && bmmResult.fileName === 'bmad-bmm-write-document',
+      'createTechWriterPromptContent generates content for BMM tech-writer',
+      `Got: ${bmmResult ? bmmResult.fileName : 'null'}`,
+    );
+
+    // Test 2b: Non-BMM tech-writer entry should return null
+    const customTechWriterEntry = {
+      'agent-name': 'tech-writer',
+      module: 'custom-module',
+      name: 'Write Document',
+    };
+    const customResult = installer.createTechWriterPromptContent(customTechWriterEntry);
+    assert(customResult === null, 'createTechWriterPromptContent returns null for non-BMM tech-writer', `Got: ${customResult}`);
+
+    // Test 2c: Core tech-writer entry should return null
+    const coreTechWriterEntry = {
+      'agent-name': 'tech-writer',
+      module: 'core',
+      name: 'Write Document',
+    };
+    const coreResult = installer.createTechWriterPromptContent(coreTechWriterEntry);
+    assert(coreResult === null, 'createTechWriterPromptContent returns null for core tech-writer', `Got: ${coreResult}`);
+
+    // Test 2d: Non-tech-writer BMM entry should return null
+    const nonTechWriterEntry = {
+      'agent-name': 'pm',
+      module: 'bmm',
+      name: 'Write Document',
+    };
+    const nonTechResult = installer.createTechWriterPromptContent(nonTechWriterEntry);
+    assert(nonTechResult === null, 'createTechWriterPromptContent returns null for non-tech-writer agents', `Got: ${nonTechResult}`);
+
+    // Test 2e: Unknown tech-writer command should return null
+    const unknownCmdEntry = {
+      'agent-name': 'tech-writer',
+      module: 'bmm',
+      name: 'Unknown Command',
+    };
+    const unknownResult = installer.createTechWriterPromptContent(unknownCmdEntry);
+    assert(unknownResult === null, 'createTechWriterPromptContent returns null for unknown commands', `Got: ${unknownResult}`);
+
+    console.log('');
+
+    // ============================================================
+    // Test Suite 3: selectedModules deduplication
+    // ============================================================
+    console.log(`${colors.yellow}Test Suite 3: selectedModules deduplication${colors.reset}\n`);
+
+    // We can't easily test generateCopilotInstructions directly without mocking,
+    // but we can verify the deduplication logic pattern
+    const testDedupe = (modules) => {
+      const installedModules = modules.length > 0 ? [...new Set(modules)] : ['core'];
+      return installedModules;
+    };
+
+    // Test 3a: Duplicate modules should be deduplicated
+    const dupeResult = testDedupe(['bmm', 'core', 'bmm', 'custom', 'core', 'custom']);
+    assert(
+      dupeResult.length === 3 && dupeResult.includes('bmm') && dupeResult.includes('core') && dupeResult.includes('custom'),
+      'Deduplication removes duplicate modules',
+      `Got: ${JSON.stringify(dupeResult)}`,
+    );
+
+    // Test 3b: Empty array defaults to core
+    const emptyResult = testDedupe([]);
+    assert(
+      emptyResult.length === 1 && emptyResult[0] === 'core',
+      'Empty modules array defaults to core',
+      `Got: ${JSON.stringify(emptyResult)}`,
+    );
+
+    // Test 3c: Order is preserved after deduplication (first occurrence wins)
+    const orderResult = testDedupe(['custom', 'bmm', 'custom', 'bmm']);
+    assert(
+      orderResult[0] === 'custom' && orderResult[1] === 'bmm',
+      'Deduplication preserves order (first occurrence)',
+      `Got: ${JSON.stringify(orderResult)}`,
+    );
+  } finally {
+    // Cleanup
+    await fs.remove(tempDir);
+  }
+
+  // Print summary
+  console.log(`${colors.cyan}========================================`);
+  console.log('Test Results:');
+  console.log(`  Passed: ${passed}`);
+  console.log(`  Failed: ${failed}`);
+  console.log(`========================================${colors.reset}\n`);
+
+  if (failed > 0) {
+    console.log(`${colors.red}Some tests failed!${colors.reset}`);
+    process.exit(1);
+  } else {
+    console.log(`${colors.green}✨ All GitHub Copilot installer tests passed!${colors.reset}`);
+  }
+}
+
+runTests().catch((error) => {
+  console.error(`${colors.red}Test runner error:${colors.reset}`, error);
+  process.exit(1);
+});
--- a/tools/cli/installers/lib/ide/github-copilot.js
+++ b/tools/cli/installers/lib/ide/github-copilot.js
@ -247,9 +247,9 @@ You must fully embody this agent's persona and follow all activation instruction
   */
  createWorkflowPromptContent(entry, workflowFile, toolsStr) {
    const description = this.escapeYamlSingleQuote(this.createPromptDescription(entry.name));
-    // bmm/config.yaml is safe to hardcode here: these prompts are only generated when
-    // bmad-help.csv exists (bmm module data), so bmm is guaranteed to be installed.
-    const configLine = `1. Load {project-root}/${this.bmadFolderName}/bmm/config.yaml and store ALL fields as session variables`;
+    // Use the module from the bmad-help.csv entry to reference the correct config.yaml
+    const configModule = entry.module || 'core';
+    const configLine = `1. Load {project-root}/${this.bmadFolderName}/${configModule}/config.yaml and store ALL fields as session variables`;

    let body;
    if (workflowFile.endsWith('.yaml')) {
@ -324,11 +324,13 @@ ${body}

  /**
   * Create prompt content for tech-writer agent-only commands (Pattern C)
+   * Tech-writer is BMM-specific - these commands only work with the BMM module.
   * @param {Object} entry - bmad-help.csv row
   * @returns {Object|null} { fileName, content } or null if not a tech-writer command
   */
  createTechWriterPromptContent(entry) {
-    if (entry['agent-name'] !== 'tech-writer') return null;
+    // Tech-writer is BMM-specific - only process entries from the bmm module
+    if (entry['agent-name'] !== 'tech-writer' || entry.module !== 'bmm') return null;

    const techWriterCommands = {
      'Write Document': { code: 'WD', file: 'bmad-bmm-write-document', description: 'Write document' },
@ -344,14 +346,16 @@ ${body}
    const safeDescription = this.escapeYamlSingleQuote(cmd.description);
    const toolsStr = this.getToolsForFile(`${cmd.file}.prompt.md`);

+    // Use the module from the bmad-help.csv entry to reference the correct paths
+    const configModule = entry.module || 'core';
    const content = `---
 description: '${safeDescription}'
 agent: 'agent'
 tools: ${toolsStr}
 ---

-1. Load {project-root}/${this.bmadFolderName}/bmm/config.yaml and store ALL fields as session variables
-2. Load the full agent file from {project-root}/${this.bmadFolderName}/bmm/agents/tech-writer/tech-writer.md and activate the Paige (Technical Writer) persona
+1. Load {project-root}/${this.bmadFolderName}/${configModule}/config.yaml and store ALL fields as session variables
+2. Load the full agent file from {project-root}/${this.bmadFolderName}/${configModule}/agents/tech-writer/tech-writer.md and activate the Paige (Technical Writer) persona
 3. Execute the ${entry.name} menu command (${cmd.code})
 `;

@ -376,15 +380,15 @@ tools: ${toolsStr}
    const agentPath = artifact.agentPath || artifact.relativePath;
    const agentFilePath = `{project-root}/${this.bmadFolderName}/${agentPath}`;

-    // bmm/config.yaml is safe to hardcode: agent activators are only generated from
-    // bmm agent artifacts, so bmm is guaranteed to be installed.
+    // Use the agent's module to reference the correct config.yaml
+    const configModule = artifact.module || 'core';
    return `---
 description: '${safeDescription}'
 agent: 'agent'
 tools: ${toolsStr}
 ---

-1. Load {project-root}/${this.bmadFolderName}/bmm/config.yaml and store ALL fields as session variables
+1. Load {project-root}/${this.bmadFolderName}/${configModule}/config.yaml and store ALL fields as session variables
 2. Load the full agent file from ${agentFilePath}
 3. Follow ALL activation instructions in the agent file
 4. Display the welcome/greeting as instructed
@ -400,7 +404,13 @@ tools: ${toolsStr}
   * @param {Map} agentManifest - Agent manifest data
   */
  async generateCopilotInstructions(projectDir, bmadDir, agentManifest, options = {}) {
-    const configVars = await this.loadModuleConfig(bmadDir);
+    // Determine installed modules (excluding internal directories)
+    const selectedModules = options.selectedModules || [];
+    // Deduplicate selectedModules to prevent duplicate paths in generated markdown
+    const installedModules = selectedModules.length > 0 ? [...new Set(selectedModules)] : ['core'];
+    const configVars = await this.loadModuleConfig(bmadDir, installedModules);
+    // Filter to only non-core modules for display (core is always listed separately)
+    const nonCoreModules = installedModules.filter((m) => m !== 'core');

    // Build the agents table from the manifest
    let agentsTable = '| Agent | Persona | Title | Capabilities |\n|---|---|---|---|\n';
@ -427,6 +437,36 @@ tools: ${toolsStr}
    }

    const bmad = this.bmadFolderName;
+
+    // Build dynamic module paths based on installed modules
+    const moduleAgentPaths = nonCoreModules.map((m) => `\`${bmad}/${m}/agents/\``).join(', ');
+    const moduleWorkflowPaths = nonCoreModules.map((m) => `\`${bmad}/${m}/workflows/\``).join(', ');
+    const moduleConfigPaths = nonCoreModules.map((m) => `\`${bmad}/${m}/config.yaml\``).join(', ');
+
+    // Build agent definitions line
+    let agentDefsLine;
+    if (nonCoreModules.length > 0) {
+      agentDefsLine = `- **Agent definitions**: ${moduleAgentPaths} and \`${bmad}/core/agents/\` (core)`;
+    } else {
+      agentDefsLine = `- **Agent definitions**: \`${bmad}/core/agents/\``;
+    }
+
+    // Build workflow definitions line
+    let workflowDefsLine;
+    if (nonCoreModules.length > 0) {
+      workflowDefsLine = `- **Workflow definitions**: ${moduleWorkflowPaths} (organized by phase)`;
+    } else {
+      workflowDefsLine = `- **Workflow definitions**: \`${bmad}/core/workflows/\``;
+    }
+
+    // Build module configuration line
+    let moduleConfigLine;
+    if (nonCoreModules.length > 0) {
+      moduleConfigLine = `- **Module configuration**: ${moduleConfigPaths}`;
+    } else {
+      moduleConfigLine = `- **Module configuration**: (no non-core modules installed)`;
+    }
+
    const bmadSection = `# BMAD Method — Project Instructions

 ## Project Configuration
@ -443,12 +483,12 @@ tools: ${toolsStr}

 ## BMAD Runtime Structure

- **Agent definitions**: \`${bmad}/bmm/agents/\` (BMM module) and \`${bmad}/core/agents/\` (core)
- **Workflow definitions**: \`${bmad}/bmm/workflows/\` (organized by phase)
+${agentDefsLine}
+${workflowDefsLine}
 - **Core tasks**: \`${bmad}/core/tasks/\` (help, editorial review, indexing, sharding, adversarial review)
 - **Core workflows**: \`${bmad}/core/workflows/\` (brainstorming, party-mode, advanced-elicitation)
 - **Workflow engine**: \`${bmad}/core/tasks/workflow.xml\` (executes YAML-based workflows)
- **Module configuration**: \`${bmad}/bmm/config.yaml\`
+${moduleConfigLine}
 - **Core configuration**: \`${bmad}/core/config.yaml\`
 - **Agent manifest**: \`${bmad}/_config/agent-manifest.csv\`
 - **Workflow manifest**: \`${bmad}/_config/workflow-manifest.csv\`
@ -457,7 +497,7 @@ tools: ${toolsStr}

 ## Key Conventions

- Always load \`${bmad}/bmm/config.yaml\` before any agent activation or workflow execution
+- Always load the agent/workflow's module \`config.yaml\` before activation or execution (each prompt file specifies which config to load)
 - Store all config fields as session variables: \`{user_name}\`, \`{communication_language}\`, \`{output_folder}\`, \`{planning_artifacts}\`, \`{implementation_artifacts}\`, \`{project_knowledge}\`
 - MD-based workflows execute directly — load and follow the \`.md\` file
 - YAML-based workflows require the workflow engine — load \`workflow.xml\` first, then pass the \`.yaml\` config
@ -504,13 +544,15 @@ Type \`/bmad-\` in Copilot Chat to see all available BMAD workflows and agent ac
  /**
   * Load module config.yaml for template variables
   * @param {string} bmadDir - BMAD installation directory
+   * @param {string[]} installedModules - List of installed modules to check for config
   * @returns {Object} Config variables
   */
-  async loadModuleConfig(bmadDir) {
-    const bmmConfigPath = path.join(bmadDir, 'bmm', 'config.yaml');
-    const coreConfigPath = path.join(bmadDir, 'core', 'config.yaml');
+  async loadModuleConfig(bmadDir, installedModules = ['core']) {
+    // Build config paths from installed modules (non-core first, then core as fallback)
+    const nonCoreModules = installedModules.filter((m) => m !== 'core');
+    const configPaths = [...nonCoreModules.map((m) => path.join(bmadDir, m, 'config.yaml')), path.join(bmadDir, 'core', 'config.yaml')];

-    for (const configPath of [bmmConfigPath, coreConfigPath]) {
+    for (const configPath of configPaths) {
      if (await fs.pathExists(configPath)) {
        try {
          const content = await fs.readFile(configPath, 'utf8');
Author	SHA1	Message	Date
Markus Ende	0a46038634	Merge `1ee10ddcab` into `9536e1e6e3`	2026-03-04 17:00:48 +02:00
Alex Verkhovsky	9536e1e6e3	feat(skills): add bmad-os-review-prompt skill (#1806 ) PromptSentinel v1.2 - reviews LLM workflow step prompts for known failure modes including silent ignoring, negation fragility, scope creep, and 14 other catalog items. Uses parallel review tracks (adversarial, catalog scan, path tracing) with structured output.	2026-03-03 22:38:58 -06:00
Alex Verkhovsky	7ece8b09fa	feat(skills): add bmad-os-findings-triage HITL triage skill (#1804 ) Team-based skill that orchestrates human-in-the-loop triage of review findings using parallel Opus agents. One agent per finding researches autonomously, proposes a plan, then holds for human conversation before a decision is recorded. Team lead maintains scorecard and lifecycle. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 22:08:55 -06:00
Markus Ende	1ee10ddcab	Merge branch 'main' into fix/copilot-hardcoded-bmm-config-path	2026-02-25 13:18:14 +01:00
Brian	147144a1ec	Merge branch 'main' into fix/copilot-hardcoded-bmm-config-path	2026-02-20 20:36:33 -06:00
Markus Ende	7a016d5efa	fix: address CodeRabbit review comments for github-copilot installer - Deduplicate selectedModules to prevent duplicate paths in markdown output - Remove unused primaryModule variable (dead code) - Refactor loadModuleConfig to accept installedModules param instead of hardcoded 'bmm' - Make tech-writer BMM-only check explicit (entry.module !== 'bmm' returns null) - Add test/test-github-copilot-installer.js with comprehensive unit tests - Add test:copilot script to package.json and include in main test command	2026-02-19 20:37:52 +01:00
Markus Ende	c017a5fdba	fix: use module-specific config.yaml paths in GitHub Copilot installer Replace hardcoded bmm/config.yaml references with dynamic module-based paths so custom modules load their own config.yaml instead of the non-existent bmm config. - createWorkflowPromptContent(): use entry.module from bmad-help.csv - createAgentActivatorPromptContent(): use artifact.module - createTechWriterPromptContent(): use entry.module for config and agent paths - generateCopilotInstructions(): dynamically list installed module paths Fixes #1708	2026-02-19 19:58:06 +01:00