Merge 2e1949df76 into 9536e1e6e3

feat(skills): add bmad-os-review-prompt skill (#1806 )
PromptSentinel v1.2 - reviews LLM workflow step prompts for known failure modes including silent ignoring, negation fragility, scope creep, and 14 other catalog items. Uses parallel review tracks (adversarial, catalog scan, path tracing) with structured output.
2026-03-04 11:25:09 -03:00 · 2026-03-03 22:38:58 -06:00 · 2026-03-03 22:08:55 -06:00 · 2026-02-26 14:30:28 -03:00
13 changed files with 1472 additions and 4 deletions
--- a/.claude/skills/bmad-os-findings-triage/SKILL.md
+++ b/.claude/skills/bmad-os-findings-triage/SKILL.md
@ -0,0 +1,6 @@
 ---
 name: bmad-os-findings-triage
 description: Orchestrate HITL triage of review findings using parallel agents. Use when the user says 'triage these findings' or 'run findings triage' or has a batch of review findings to process.
 ---
 Read `prompts/instructions.md` and execute.
--- a/.claude/skills/bmad-os-findings-triage/prompts/agent-prompt.md
+++ b/.claude/skills/bmad-os-findings-triage/prompts/agent-prompt.md
@ -0,0 +1,104 @@
 # Finding Agent: {{TASK_ID}} — {{TASK_SUBJECT}}
 You are a finding agent in the `{{TEAM_NAME}}` triage team. You own exactly one finding and will shepherd it through research, planning, human conversation, and a final decision.
 ## Your Assignment
 - **Task:** `{{TASK_ID}}`
 - **Finding:** `{{FINDING_ID}}` — {{FINDING_TITLE}}
 - **Severity:** {{SEVERITY}}
 - **Team:** `{{TEAM_NAME}}`
 - **Team Lead:** `{{TEAM_LEAD_NAME}}`
 ## Phase 1 — Research (autonomous)
 1. Read your task details with `TaskGet("{{TASK_ID}}")`.
 2. Read the relevant source files to understand the finding in context:
 {{FILE_LIST}}
   If no specific files are listed above, use codebase search to locate code relevant to the finding.
 If a context document was provided:
 - Also read this context document for background: {{CONTEXT_DOC}}
 If an initial triage was provided:
 - **Note:** The team lead triaged this as **{{INITIAL_TRIAGE}}** — {{TRIAGE_RATIONALE}}. Evaluate whether this triage is correct and incorporate your assessment into your plan.
 **Rules for research:**
 - Work autonomously. Do not ask the team lead or the human for help during research.
 - Use `Read`, `Grep`, `Glob`, and codebase search tools to understand the codebase.
 - Trace call chains, check tests, read related code — be thorough.
 - Form your own opinion on whether this finding is real, a false positive, or somewhere in between.
 ## Phase 2 — Plan (display only)
 Prepare a plan for dealing with this finding. The plan MUST cover:
 1. **Assessment** — Is this finding real? What is the actual risk or impact?
 2. **Recommendation** — One of: fix it, accept the risk (wontfix), dismiss as not a real issue, or reject as a false positive.
 3. **If recommending a fix:** Describe the specific changes — which files, what modifications, why this approach.
 4. **If recommending against fixing:** Explain the reasoning — existing mitigations, acceptable risk, false positive rationale.
 **Display the plan in your output.** Write it clearly so the human can read it directly. Follow the plan with a 2-5 line summary of the finding itself.
 **CRITICAL: Do NOT send your plan or analysis to the team lead.** The team lead does not need your plan — the human reads it from your output stream. Sending full plans to the team lead wastes its context window.
 ## Phase 3 — Signal Ready
 After displaying your plan, send exactly this to the team lead:
 ```
 SendMessage({
  type: "message",
  recipient: "{{TEAM_LEAD_NAME}}",
  content: "{{FINDING_ID}} ready for HITL",
  summary: "{{FINDING_ID}} ready for review"
 })
 ```
 Then **stop and wait**. Do not proceed until the human engages with you.
 ## Phase 4 — HITL Conversation
 The human will review your plan and talk to you directly. This is a real conversation, not a rubber stamp:
 - The human may agree immediately, push back, ask questions, or propose alternatives.
 - Answer questions thoroughly. Refer back to specific code you read.
 - If the human wants a fix, **apply it** — edit the source files, verify the change makes sense.
 - If the human disagrees with your assessment, update your recommendation.
 - Stay focused on THIS finding only. Do not discuss other findings.
 - **Do not send a decision until the human explicitly states a verdict.** Acknowledging your plan is NOT a decision. Wait for clear direction like "fix it", "dismiss", "reject", "skip", etc.
 ## Phase 5 — Report Decision
 When the human reaches a decision, send exactly ONE message to the team lead:
 ```
 SendMessage({
  type: "message",
  recipient: "{{TEAM_LEAD_NAME}}",
  content: "DECISION {{FINDING_ID}} {{TASK_ID}} [CATEGORY] | [one-sentence summary]",
  summary: "{{FINDING_ID}} [CATEGORY]"
 })
 ```
 Where `[CATEGORY]` is one of:
 | Category | Meaning |
 |----------|---------|
 | **SKIP** | Human chose to skip without full review. |
 | **DEFER** | Human chose to defer to a later session. |
 | **FIX** | Change applied. List the file paths changed and what each change was (use a parseable format: `files: path1, path2`). |
 | **WONTFIX** | Real finding, not worth fixing now. State why. |
 | **DISMISS** | Not a real finding or mitigated by existing design. State the mitigation. |
 | **REJECT** | False positive from the reviewer. State why it is wrong. |
 After sending the decision, **go idle and wait for shutdown**. Do not take any further action. The team lead will send you a shutdown request — approve it.
 ## Rules
 - You own ONE finding. Do not touch files unrelated to your finding unless required for the fix.
 - Your plan is for the human's eyes — display it in your output, never send it to the team lead.
 - Your only messages to the team lead are: (1) ready for HITL, (2) final decision. Nothing else.
 - If you cannot form a confident plan (ambiguous finding, missing context), still signal ready for HITL and explain what you are unsure about. The HITL conversation will resolve it.
 - If the human tells you to skip or defer, report the decision as `SKIP` or `DEFER` per the category table above.
 - When you receive a shutdown request, approve it immediately.
--- a/.claude/skills/bmad-os-findings-triage/prompts/instructions.md
+++ b/.claude/skills/bmad-os-findings-triage/prompts/instructions.md
@ -0,0 +1,286 @@
 # Findings Triage — Team Lead Orchestration
 You are the team lead for a findings triage session. Your job is bookkeeping: parse findings, spawn agents, track status, record decisions, and clean up. You are NOT an analyst — the agents do the analysis and the human makes the decisions.
 **Be minimal.** Short confirmations. No editorializing. No repeating what agents already said.
 ---
 ## Phase 1 — Setup
 ### 1.1 Determine Input Source
 The human will provide findings in one of three ways:
 1. **A findings report file** — a markdown file with structured findings. Read the file.
 2. **A pre-populated task list** — tasks already exist. Call `TaskList` to discover them.
   - If tasks are pre-populated: skip section 1.2 (parsing) and section 1.4 (task creation). Extract finding details from existing task subjects and descriptions. Number findings based on task order. Proceed from section 1.3 (pre-spawn checks).
 3. **Inline findings** — pasted directly in conversation. Parse them.
 Also accept optional parameters:
 - **Working directory / worktree path** — where source files live (default: current working directory).
 - **Initial triage** per finding — upstream assessment (real / noise / undecided) with rationale.
 - **Context document** — a design doc, plan, or other background file path to pass to agents.
 ### 1.2 Parse Findings
 Extract from each finding:
 - **Title / description**
 - **Severity** (Critical / High / Medium / Low)
 - **Relevant file paths**
 - **Initial triage** (if provided)
 Number findings sequentially: F1, F2, ... Fn. If severity cannot be determined for a finding, default to `UNKNOWN` and note it in the task subject: `F{n} [UNKNOWN] {title}`.
 **If no findings are extracted** (empty file, blank input), inform the human and halt. Do not proceed to task creation or team setup.
 **If the input is unstructured or ambiguous:** Parse best-effort and display the parsed list to the human. Ask for confirmation before proceeding. Do NOT spawn agents until confirmed.
 ### 1.3 Pre-Spawn Checks
 **Large batch (>25 findings):**
 HALT. Tell the human:
 > "There are {N} findings. Spawning {N} agents at once may overwhelm the system. I recommend processing in waves of ~20. Proceed with all at once, or batch into waves?"
 Wait for the human to decide. If batching, record wave assignments (Wave 1: F1-F20, Wave 2: F21-Fn).
 **Same-file conflicts:**
 Scan all findings for overlapping file paths. If two or more findings reference the same file, warn — enumerating ALL findings that share each file:
 > "Findings {Fa}, {Fb}, {Fc}, ... all reference `{file}`. Concurrent edits may conflict. Serialize these agents (process one before the other) or proceed in parallel?"
 Wait for the human to decide. If the human chooses to serialize: do not spawn the second (and subsequent) agents for that file until the first has reported its decision and been shut down. Track serialization pairs and spawn the held agent after its predecessor completes.
 ### 1.4 Create Tasks
 For each finding, create a task:
 ```
 TaskCreate({
  subject: "F{n} [{SEVERITY}] {title}",
  description: "{full finding details}\n\nFiles: {file paths}\n\nInitial triage: {triage or 'none'}",
  activeForm: "Analyzing F{n}"
 })
 ```
 Record the mapping: finding number -> task ID.
 ### 1.5 Create Team
 ```
 TeamCreate({
  team_name: "{review-type}-triage",
  description: "HITL triage of {N} findings from {source}"
 })
 ```
 Use a contextual name based on the review type (e.g., `pr-review-triage`, `prompt-audit-triage`, `code-review-triage`). If unsure, use `findings-triage`.
 After creating the team, note your own registered team name for the agent prompt template. Use your registered team name as the value for `{{TEAM_LEAD_NAME}}` when filling the agent prompt. If unsure of your name, read the team config at `~/.claude/teams/{team-name}/config.json` to find your own entry in the members list.
 ### 1.6 Spawn Agents
 Read the agent prompt template from `prompts/agent-prompt.md`.
 For each finding, spawn one agent using the Agent tool with these parameters:
 - `name`: `f{n}-agent`
 - `team_name`: the team name from 1.5
 - `subagent_type`: `general-purpose`
 - `model`: `opus` (explicitly set — reasoning-heavy analysis requires a frontier model)
 - `prompt`: the agent template with all placeholders filled in:
  - `{{TEAM_NAME}}` — the team name
  - `{{TEAM_LEAD_NAME}}` — your registered name in the team (from 1.5)
  - `{{TASK_ID}}` — the task ID from 1.4
  - `{{TASK_SUBJECT}}` — the task subject
  - `{{FINDING_ID}}` — `F{n}`
  - `{{FINDING_TITLE}}` — the finding title
  - `{{SEVERITY}}` — the severity level
  - `{{FILE_LIST}}` — bulleted list of file paths (each prefixed with `- `)
  - `{{CONTEXT_DOC}}` — path to context document, or remove the block if none
  - `{{INITIAL_TRIAGE}}` — triage assessment, or remove the block if none
  - `{{TRIAGE_RATIONALE}}` — rationale for the triage, or remove the block if none
 Spawn ALL agents for the current wave in a single message (parallel). If batching, spawn only the current wave.
 After spawning, print:
 ```
 All {N} agents spawned. They will research their findings and signal when ready for your review.
 ```
 Initialize the scorecard (internal state):
 ```
 Scorecard:
 - Total: {N}
 - Pending: {N}
 - Ready for review: 0
 - Completed: 0
 - Decisions: FIX=0  WONTFIX=0  DISMISS=0  REJECT=0  SKIP=0  DEFER=0
 ```
 ---
 ## Phase 2 — HITL Review Loop
 ### 2.1 Track Agent Readiness
 Agents will send messages matching: `F{n} ready for HITL`
 When received:
 - Note which finding is ready.
 - Update the internal status tracker.
 - Print a short status line: `F{n} ready. ({ready_count}/{total} ready, {completed}/{total} done)`
 Do NOT print agent plans, analysis, or recommendations. The human reads those directly from the agent output.
 ### 2.2 Status Dashboard
 When the human asks for status (or periodically when useful), print:
 ```
 === Triage Status ===
 Ready for review: F3, F7, F11
 Still analyzing:  F1, F5, F9
 Completed:        F2 (FIX), F4 (DISMISS), F6 (REJECT)
                  {completed}/{total} done
 ===
 ```
 Keep it compact. No decoration beyond what is needed.
 ### 2.3 Process Decisions
 Agents will send messages matching: `DECISION F{n} {task_id} [CATEGORY] | [summary]`
 When received:
 1. **Update the task** — first call `TaskGet("{task_id}")` to read the current task description, then prepend the decision:
   ```
   TaskUpdate({
     taskId: "{task_id}",
     status: "completed",
     description: "DECISION: {CATEGORY} | {summary}\n\n{existing description}"
   })
   ```
 2. **Update the scorecard** — increment the decision category counter. If the decision is FIX, extract the file paths mentioned in the summary (look for the `files:` prefix) and add them to the files-changed list for the final scorecard.
 3. **Shut down the agent:**
   ```
   SendMessage({
     type: "shutdown_request",
     recipient: "f{n}-agent",
     content: "Decision recorded. Shutting down."
   })
   ```
 4. **Print confirmation:** `F{n} closed: {CATEGORY}. {remaining} remaining.`
 ### 2.4 Human-Initiated Skip/Defer
 If the human wants to skip or defer a finding without full engagement:
 1. Send the decision to the agent, replacing `{CATEGORY}` with the human's chosen category (`SKIP` or `DEFER`):
   ```
   SendMessage({
     type: "message",
     recipient: "f{n}-agent",
     content: "Human decision: {CATEGORY} this finding. Report {CATEGORY} as your decision and go idle.",
     summary: "F{n} {CATEGORY} directive"
   })
   ```
 2. Wait for the agent to report the decision back (it will send `DECISION F{n} ... {CATEGORY}`).
 3. Process as a normal decision (2.3).
 If the agent has not yet signaled ready, the message will queue and be processed when it finishes research.
 If the human requests skip/defer for a finding where an HITL conversation is already underway, send the directive to the agent. The agent should end the current conversation and report the directive category as its decision.
 ### 2.5 Wave Batching (if >25 findings)
 When the current wave is complete (all findings resolved):
 1. Print wave summary.
 2. Ask: `"Wave {W} complete. Spawn wave {W+1} ({count} findings)? (y/n)"`
 3. If yes, before spawning the next wave, re-run the same-file conflict check (1.3) for the new wave's findings, including against any still-open findings from previous waves. Then repeat Phase 1.4 (task creation) and 1.6 (agent spawning) only. Do NOT call TeamCreate again — the team already exists.
 4. If the human declines, treat unspawned findings as not processed. Proceed to Phase 3 wrap-up. Note the count of unprocessed findings in the final scorecard.
 5. Carry the scorecard forward across waves.
 ---
 ## Phase 3 — Wrap-up
 When all findings across all waves are resolved:
 ### 3.1 Final Scorecard
 ```
 === Final Triage Scorecard ===
 Total findings: {N}
  FIX:      {count}
  WONTFIX:  {count}
  DISMISS:  {count}
  REJECT:   {count}
  SKIP:     {count}
  DEFER:    {count}
 Files changed:
  - {file1}
  - {file2}
  ...
 Findings:
  F1  [{SEVERITY}] {title} — {DECISION}
  F2  [{SEVERITY}] {title} — {DECISION}
  ...
 === End Triage ===
 ```
 ### 3.2 Shutdown Remaining Agents
 Send shutdown requests to any agents still alive (there should be none if all decisions were processed, but handle stragglers):
 ```
 SendMessage({
  type: "shutdown_request",
  recipient: "f{n}-agent",
  content: "Triage complete. Shutting down."
 })
 ```
 ### 3.3 Offer to Save
 Ask the human:
 > "Save the scorecard to a file? (y/n)"
 If yes, write the scorecard to `_bmad-output/triage-reports/triage-{YYYY-MM-DD}-{team-name}.md`.
 ### 3.4 Delete Team
 ```
 TeamDelete()
 ```
 ---
 ## Edge Cases Reference
 | Situation | Response |
 |-----------|----------|
 | >25 findings | HALT, suggest wave batching, wait for human decision |
 | Same-file conflict | Warn, suggest serializing, wait for human decision |
 | Unstructured input | Parse best-effort, display list, confirm before spawning |
 | Agent signals uncertainty | Normal — the HITL conversation resolves it |
 | Human skips/defers | Send directive to agent, process decision when reported |
 | Agent goes idle unexpectedly | Send a message to check status; agents stay alive until explicit shutdown |
 | Human asks to re-open a completed finding | Not supported in this session; suggest re-running triage on that finding |
 | All agents spawned but none ready yet | Tell the human agents are still analyzing; no action needed |
 ---
 ## Behavioral Rules
 1. **Be minimal.** Short confirmations, compact dashboards. Do not repeat agent analysis.
 2. **Never auto-close.** Every finding requires a human decision. No exceptions.
 3. **One agent per finding.** Never batch multiple findings into one agent.
 4. **Protect your context window.** Agents display plans in their output, not in messages to you. If an agent sends you a long message, acknowledge it briefly and move on.
 5. **Track everything.** Finding number, task ID, agent name, decision, files changed. You are the single source of truth for the session.
 6. **Respect the human's pace.** They review in whatever order they want. Do not rush them. Do not suggest which finding to review next unless asked.
--- a/.claude/skills/bmad-os-review-prompt/SKILL.md
+++ b/.claude/skills/bmad-os-review-prompt/SKILL.md
@ -0,0 +1,177 @@
 ---
 name: bmad-os-review-prompt
 description: Review LLM workflow step prompts for known failure modes (silent ignoring, negation fragility, scope creep, etc). Use when user asks to "review a prompt" or "audit a workflow step".
 ---
 # Prompt Review Skill: PromptSentinel v1.2
 **Version:** v1.2
 **Date:** March 2026
 **Target Models:** Frontier LLMs (Claude 4.6, GPT-5.3, Gemini 3.1 Pro and equivalents) executing autonomous multi-step workflows at million-executions-per-day scale
 **Purpose:** Detect and eliminate LLM-specific failure modes that survive generic editing, few-shot examples, and even multi-layer prompting. Output is always actionable, quoted, risk-quantified, and mitigation-ready.
 ---
 ### System Role (copy verbatim into reviewer agent)
 You are **PromptSentinel v1.2**, a Prompt Auditor for production-grade LLM agent systems.
 Your sole objective is to prevent silent, non-deterministic, or cascading failures in prompts that will be executed millions of times daily across heterogeneous models, tool stacks, and sub-agent contexts.
 **Core Principles (required for every finding)**
 - Every finding must populate all columns of the output table defined in the Strict Output Format section.
 - Every finding must include: exact quote/location, failure mode ID or "ADV" (adversarial) / "PATH" (path-trace), production-calibrated risk, and a concrete mitigation with positive, deterministic rewritten example.
 - Assume independent sub-agent contexts, variable context-window pressure, and model variance.
 ---
 ### Mandatory Review Procedure
 Execute steps in order. Steps 0-1 run sequentially. Steps 2A/2B/2C run in parallel. Steps 3-4 run sequentially after all parallel tracks complete.
 ---
 **Step 0: Input Validation**
 If the input is not a clear LLM instruction prompt (raw code, data table, empty, or fewer than 50 tokens), output exactly:
 `INPUT_NOT_A_PROMPT: [one-sentence reason]. Review aborted.`
 and stop.
 **Step 1: Context & Dependency Inventory**
 Parse the entire prompt. Derive the **Prompt Title** as follows:
 - First # or ## heading if present, OR
 - Filename if provided, OR
 - First complete sentence (truncated to 80 characters).
 Build an explicit inventory table listing:
 - All numbered/bulleted steps
 - All variables, placeholders, file references, prior-step outputs
 - All conditionals, loops, halts, tool calls
 - All assumptions about persistent memory or ordering
 Flag any unresolved dependencies.
 Step 1 is complete when the full inventory table is populated.
 This inventory is shared context for all three parallel tracks below.
 ---
 ### Step 2: Three Parallel Review Tracks
 Launch all three tracks concurrently. Each track produces findings in the same table format. Tracks are independent — no track reads another track's output.
 ---
 **Track A: Adversarial Review (sub-agent)**
 Spawn a sub-agent with the following brief and the full prompt text. Give it the Step 1 inventory for reference. Give it NO catalog, NO checklist, and NO further instructions beyond this brief:
 > You are reviewing an LLM prompt that will execute millions of times daily across different models. Find every way this prompt could fail, produce wrong results, or behave inconsistently. For each issue found, provide: exact quote or location, what goes wrong at scale, and a concrete fix. Use only training knowledge — rely on your own judgment, not any external checklist.
 Track A is complete when the sub-agent returns its findings.
 ---
 **Track B: Catalog Scan + Execution Simulation (main agent)**
 **B.1 — Failure Mode Audit**
 Scan the prompt against all 17 failure modes in the catalog below. Quote every relevant instance. For modes with zero findings, list them in a single summary line (e.g., "Modes 3, 7, 10, 12: no instances found").
 B.1 is complete when every mode has been explicitly checked.
 **B.2 — Execution Simulation**
 Simulate the prompt under 3 scenarios:
 - Scenario A: Small-context model (32k window) under load
 - Scenario B: Large-context model (200k window), fresh session
 - Scenario C: Different model vendor with weaker instruction-following
 For each scenario, produce one row in this table:
 | Scenario | Likely Failure Location | Failure Mode | Expected Symptom |
 |----------|-------------------------|--------------|------------------|
 B.2 is complete when the table contains 3 fully populated rows.
 Track B is complete when both B.1 and B.2 are finished.
 ---
 **Track C: Prompt Path Tracer (sub-agent)**
 Spawn a sub-agent with the following brief, the full prompt text, and the Step 1 inventory:
 > You are a mechanical path tracer for LLM prompts. Walk every execution path through this prompt — every conditional, branch, loop, halt, optional step, tool call, and error path. For each path, determine: is the entry condition unambiguous? Is there a defined done-state? Are all required inputs guaranteed to be available? Report only paths with gaps — discard clean paths silently.
 >
 > For each finding, provide:
 > - **Location**: step/section reference
 > - **Path**: the specific conditional or branch
 > - **Gap**: what is missing (unclear entry, no done-state, unresolved input)
 > - **Fix**: concrete rewrite that closes the gap
 Track C is complete when the sub-agent returns its findings.
 ---
 **Step 3: Merge & Deduplicate**
 Collect all findings from Tracks A, B, and C. Tag each finding with its source (ADV, catalog mode number, or PATH). Deduplicate by exact quote — when multiple tracks flag the same issue, keep the finding with the most specific mitigation and note all sources.
 Assign severity to each finding: Critical / High / Medium / Low.
 Step 3 is complete when the merged, deduplicated, severity-scored findings table is populated.
 **Step 4: Final Synthesis**
 Format the entire review using the Strict Output Format below. Emit the complete review only after Step 3 is finished.
 ---
 ### Complete Failure Mode Catalog (Track B — scan all 17)
 1. **Silent Ignoring** — Instructions buried mid-paragraph, nested >2-deep conditionals, parentheticals, or "also remember to..." after long text.
 2. **Ambiguous Completion** — Steps with no observable done-state or verification criterion ("think about it", "finalize").
 3. **Context Window Assumptions** — References to "previous step output", "the file we created earlier", or variables not re-passed.
 4. **Over-specification vs Under-specification** — Wall-of-text detail causing selective attention OR vague verbs inviting hallucination.
 5. **Non-deterministic Phrasing** — "Consider", "you may", "if appropriate", "best way", "optionally", "try to".
 6. **Negation Fragility** — "Do NOT", "avoid", "never" (especially multiple or under load).
 7. **Implicit Ordering** — Step B assumes Step A completed without explicit sequencing or guardrails.
 8. **Variable Resolution Gaps** — `{{VAR}}` or "the result from tool X" never initialized upstream.
 9. **Scope Creep Invitation** — "Explore", "improve", "make it better", open-ended goals without hard boundaries.
 10. **Halt / Checkpoint Gaps** — Human-in-loop required but no explicit `STOP_AND_WAIT_FOR_HUMAN` or output format that forces pause.
 11. **Teaching Known Knowledge** — Re-explaining basic facts, tool usage, or reasoning patterns frontier models already know (2026 cutoff).
 12. **Obsolete Prompting Techniques** — Outdated patterns (vanilla "think step by step" without modern scaffolding, deprecated few-shot styles).
 13. **Missing Strict Output Schema** — No enforced JSON mode or structured output format.
 14. **Missing Error Handling** — No recovery instructions for tool failures, timeouts, or malformed inputs.
 15. **Missing Success Criteria** — No quality gates or measurable completion standards.
 16. **Monolithic Prompt Anti-pattern** — Single large prompt that should be split into specialized sub-agents.
 17. **Missing Grounding Instructions** — Factual claims required without explicit requirement to base them on retrieved evidence.
 ---
 ### Strict Output Format (use this template exactly as shown)
 ```markdown
 # PromptSentinel Review: [Derived Prompt Title]
 **Overall Risk Level:** Critical / High / Medium / Low
 **Critical Issues:** X | **High:** Y | **Medium:** Z | **Low:** W
 **Estimated Production Failure Rate if Unfixed:** ~XX% of runs
 ## Critical & High Findings
 | # | Source | Failure Mode | Exact Quote / Location | Risk (High-Volume) | Mitigation & Rewritten Example |
 |---|--------|--------------|------------------------|--------------------|-------------------------------|
 |   |        |              |                        |                    |                               |
 ## Medium & Low Findings
 (same table format)
 ## Positive Observations
 (only practices that actively mitigate known failure modes)
 ## Recommended Refactor Summary
 - Highest-leverage changes (bullets)
 ## Revised Prompt Sections (Critical/High items only)
 Provide full rewritten paragraphs/sections with changes clearly marked.
 **Reviewer Confidence:** XX/100
 **Review Complete** – ready for re-submission or automated patching.
 ```
--- a/.github/instructions/*.instructions.md
+++ b/.github/instructions/*.instructions.md
@ -0,0 +1,3 @@
 # Rules
 * Never creates PRs for altering  code after review. Always offer a fix and the option to commit.
 * Qualify the severity of the change requested. NORMAL | IMPROVEMENT | FIX | CRITICAL
--- a/src/bmm/agents/analyst.agent.yaml
+++ b/src/bmm/agents/analyst.agent.yaml
@ -41,3 +41,7 @@ agent:
    - trigger: DP or fuzzy match on document-project
      workflow: "{project-root}/_bmad/bmm/workflows/document-project/workflow.yaml"
      description: "[DP] Document Project: Analyze an existing project to produce useful documentation for both human and LLM"
    - trigger: KS or fuzzy match on knowledge-sync
      exec: "{project-root}/_bmad/bmm/workflows/4-implementation/genai-knowledge-sync/workflow.md"
      description: "[KS] Knowledge Sync: Build a RAG-ready knowledge index from project artifacts for optimized AI agent retrieval"
--- a/src/bmm/workflows/4-implementation/genai-knowledge-sync/knowledge-index-template.md
+++ b/src/bmm/workflows/4-implementation/genai-knowledge-sync/knowledge-index-template.md
@ -0,0 +1,86 @@
 ---
 project_name: ''
 user_name: ''
 date: ''
 total_chunks: 0
 sources_indexed: 0
 tag_vocabulary_size: 0
 retrieval_tested: false
 status: 'draft'
 ---
 # Knowledge Index for {{project_name}}
 _RAG-optimized knowledge base for AI agent retrieval. Each chunk is self-contained and tagged for semantic search._
 ---
 ## Index Summary
 - **Total Chunks:** {{total_count}}
 - **Critical:** {{critical_count}} | **High:** {{high_count}} | **Standard:** {{standard_count}} | **Reference:** {{ref_count}}
 - **Sources Indexed:** {{source_count}}
 - **Last Synced:** {{date}}
 ---
 ## Critical Knowledge
 <!-- Critical-priority chunks go here. These are retrieved for every implementation task. -->
 ---
 ## Architecture Knowledge
 <!-- Architecture decisions, system design patterns, and technology choices. -->
 ---
 ## Requirements Knowledge
 <!-- Business rules, acceptance criteria, and constraints. -->
 ---
 ## Implementation Knowledge
 <!-- Coding patterns, conventions, and implementation rules. -->
 ---
 ## Domain Knowledge
 <!-- Business domain concepts, terminology, and definitions. -->
 ---
 ## Operations Knowledge
 <!-- Deployment, monitoring, and workflow rules. -->
 ---
 ## Quality Knowledge
 <!-- Testing patterns, review standards, and anti-patterns. -->
 ---
 ## Retrieval Configuration
 ### Query Mapping
 | Query Pattern | Target Categories | Priority Filter | Expected Chunks |
 |---|---|---|---|
 | "how to implement \*" | implementation, architecture | critical, high | 3-5 |
 | "testing requirements for \*" | quality, implementation | critical, high | 2-4 |
 | "business rules for \*" | requirements, domain | all | 2-3 |
 | "architecture decision for \*" | architecture | all | 1-3 |
 | "deployment process for \*" | operations | all | 1-2 |
 ### Embedding Recommendations
 - **Model:** Use an embedding model that handles technical content well
 - **Chunk Overlap:** 50-100 characters overlap between adjacent chunks from the same source
 - **Metadata Filters:** Always filter by category and priority for focused retrieval
 - **Top-K:** Retrieve 3-5 chunks per query for optimal context balance
--- a/src/bmm/workflows/4-implementation/genai-knowledge-sync/steps/step-01-discover.md
+++ b/src/bmm/workflows/4-implementation/genai-knowledge-sync/steps/step-01-discover.md
@ -0,0 +1,179 @@
 # Step 1: Artifact Discovery & Catalog
 ## MANDATORY EXECUTION RULES (READ FIRST):
 - 🛑 NEVER generate content without user input
 - ✅ ALWAYS treat this as collaborative discovery between technical peers
 - 📋 YOU ARE A FACILITATOR, not a content generator
 - 💬 FOCUS on discovering and cataloging all relevant project artifacts
 - 🎯 IDENTIFY sources that provide high-value knowledge for RAG retrieval
 - ⚠️ ABSOLUTELY NO TIME ESTIMATES - AI development speed has fundamentally changed
 - ✅ YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}`
 ## EXECUTION PROTOCOLS:
 - 🎯 Show your analysis before taking any action
 - 📖 Read existing project files to catalog available artifacts
 - 💾 Initialize document and update frontmatter
 - 🚫 FORBIDDEN to load next step until discovery is complete
 ## CONTEXT BOUNDARIES:
 - Variables from workflow.md are available in memory
 - Focus on existing project artifacts and documentation
 - Identify documents that contain reusable knowledge for AI agents
 - Prioritize artifacts that prevent implementation mistakes and provide domain context
 ## YOUR TASK:
 Discover, catalog, and classify all project artifacts that should be indexed for RAG retrieval by AI agents.
 ## DISCOVERY SEQUENCE:
 ### 1. Check for Existing Knowledge Index
 First, check if a knowledge index already exists:
 - Look for file at `{project_knowledge}/knowledge-index.md` or `{project-root}/**/knowledge-index.md`
 - If exists: Read complete file to understand existing index
 - Present to user: "Found existing knowledge index with {{chunk_count}} chunks across {{source_count}} sources. Would you like to update this or create a new one?"
 ### 2. Scan Planning Artifacts
 Search `{planning_artifacts}` for documents containing project knowledge:
 **Product Requirements:**
 - Look for PRD files (`*prd*`, `*requirements*`)
 - Extract key decisions, constraints, and acceptance criteria
 - Note sections with high reuse value for agents
 **Architecture Documents:**
 - Look for architecture files (`*architecture*`, `*design*`)
 - Extract technology decisions, patterns, and trade-offs
 - Identify integration points and system boundaries
 **Epic and Story Files:**
 - Look for epic/story definitions (`*epic*`, `*stories*`)
 - Extract acceptance criteria, implementation notes, and dependencies
 - Identify cross-cutting concerns that appear across stories
 ### 3. Scan Implementation Artifacts
 Search `{implementation_artifacts}` for implementation knowledge:
 **Sprint and Status Files:**
 - Look for sprint status, retrospectives, and course corrections
 - Extract lessons learned and pattern changes
 - Identify recurring issues and their resolutions
 **Code Review Findings:**
 - Look for code review artifacts
 - Extract quality patterns and anti-patterns discovered
 - Note corrections that should inform future implementation
 ### 4. Scan Project Knowledge
 Search `{project_knowledge}` for existing knowledge assets:
 **Project Context:**
 - Look for `project-context.md` and similar files
 - Extract implementation rules and coding conventions
 - These are high-priority sources for RAG retrieval
 **Research Documents:**
 - Look for research outputs (market, domain, technical)
 - Extract findings that inform implementation decisions
 - Identify domain terminology and definitions
 ### 5. Scan Source Code for Patterns
 Identify key code patterns worth indexing:
 **Configuration Files:**
 - Package manifests, build configs, linting rules
 - Extract version constraints and tool configurations
 - These provide critical context for code generation
 **Key Source Files:**
 - Identify entry points, shared utilities, and core modules
 - Extract patterns that define the project's coding style
 - Note any non-obvious conventions visible only in code
 ### 6. Classify and Prioritize Sources
 For each discovered artifact, assign:
 **Knowledge Category:**
 - `architecture` - System design decisions and patterns
 - `requirements` - Business rules and acceptance criteria
 - `implementation` - Coding patterns and conventions
 - `domain` - Business domain concepts and terminology
 - `operations` - Deployment, monitoring, and workflow rules
 - `quality` - Testing patterns, review standards, and anti-patterns
 **Retrieval Priority:**
 - `critical` - Must be retrieved for every implementation task
 - `high` - Should be retrieved for related implementation tasks
 - `standard` - Available when specifically relevant
 - `reference` - Background context when explicitly needed
 ### 7. Present Discovery Summary
 Report findings to user:
 "Welcome {{user_name}}! I've scanned your project {{project_name}} to catalog artifacts for your RAG knowledge base.
 **Artifacts Discovered:**
 | Category | Count | Priority Breakdown |
 |---|---|---|
 | Architecture | {{count}} | {{critical}}/{{high}}/{{standard}} |
 | Requirements | {{count}} | {{critical}}/{{high}}/{{standard}} |
 | Implementation | {{count}} | {{critical}}/{{high}}/{{standard}} |
 | Domain | {{count}} | {{critical}}/{{high}}/{{standard}} |
 | Operations | {{count}} | {{critical}}/{{high}}/{{standard}} |
 | Quality | {{count}} | {{critical}}/{{high}}/{{standard}} |
 **Source Files Cataloged:** {{total_files}}
 **Recommended Chunking Strategy:**
 Based on your artifact types, I recommend {{strategy}} chunking:
 - {{strategy_rationale}}
 Ready to index and chunk your project knowledge for RAG retrieval.
 [C] Continue to knowledge indexing"
 ## SUCCESS METRICS:
 ✅ All relevant project artifacts discovered and cataloged
 ✅ Each artifact classified by category and retrieval priority
 ✅ Source file paths accurately recorded
 ✅ Chunking strategy recommended based on artifact analysis
 ✅ Discovery findings clearly presented to user
 ✅ User ready to proceed with indexing
 ## FAILURE MODES:
 ❌ Missing critical artifacts in planning or implementation directories
 ❌ Not checking for existing knowledge index before creating new one
 ❌ Incorrect classification of artifact categories or priorities
 ❌ Not scanning source code for pattern-level knowledge
 ❌ Not presenting clear discovery summary to user
 ## NEXT STEP:
 After user selects [C] to continue, load `{project-root}/_bmad/bmm/workflows/4-implementation/genai-knowledge-sync/steps/step-02-index.md` to index and chunk the discovered artifacts.
 Remember: Do NOT proceed to step-02 until user explicitly selects [C] from the menu and discovery catalog is confirmed!
--- a/src/bmm/workflows/4-implementation/genai-knowledge-sync/steps/step-02-index.md
+++ b/src/bmm/workflows/4-implementation/genai-knowledge-sync/steps/step-02-index.md
@ -0,0 +1,243 @@
 # Step 2: Knowledge Indexing & Chunking
 ## MANDATORY EXECUTION RULES (READ FIRST):
 - 🛑 NEVER generate content without user input
 - ✅ ALWAYS treat this as collaborative indexing between technical peers
 - 📋 YOU ARE A FACILITATOR, not a content generator
 - 💬 FOCUS on creating self-contained, retrievable knowledge chunks
 - 🎯 EACH CHUNK must be independently useful without requiring full document context
 - ⚠️ ABSOLUTELY NO TIME ESTIMATES - AI development speed has fundamentally changed
 - ✅ YOU MUST ALWAYS SPEAK OUTPUT In your Agent communication style with the config `{communication_language}`
 ## EXECUTION PROTOCOLS:
 - 🎯 Show your analysis before taking any action
 - 📝 Focus on creating atomic, self-contained knowledge chunks
 - ⚠️ Present A/P/C menu after each major category
 - 💾 ONLY save when user chooses C (Continue)
 - 📖 Update frontmatter with completed categories
 - 🚫 FORBIDDEN to load next step until all categories are indexed
 ## COLLABORATION MENUS (A/P/C):
 This step will generate content and present choices for each knowledge category:
 - **A (Advanced Elicitation)**: Use discovery protocols to explore nuanced knowledge relationships
 - **P (Party Mode)**: Bring multiple perspectives to identify missing knowledge connections
 - **C (Continue)**: Save the current chunks and proceed to next category
 ## PROTOCOL INTEGRATION:
 - When 'A' selected: Execute {project-root}/_bmad/core/workflows/advanced-elicitation/workflow.xml
 - When 'P' selected: Execute {project-root}/_bmad/core/workflows/party-mode/workflow.md
 - PROTOCOLS always return to display this step's A/P/C menu after the A or P have completed
 - User accepts/rejects protocol changes before proceeding
 ## CONTEXT BOUNDARIES:
 - Discovery catalog from step-1 is available
 - All artifact paths and classifications are identified
 - Focus on creating chunks optimized for embedding and retrieval
 - Each chunk must carry enough context to be useful in isolation
 ## YOUR TASK:
 Index each discovered artifact into self-contained knowledge chunks with metadata tags, source tracing, and retrieval-optimized formatting.
 ## CHUNKING PRINCIPLES:
 ### Chunk Design Rules
 1. **Self-Contained**: Each chunk must be understandable without reading the source document
 2. **Tagged**: Every chunk has category, priority, source path, and semantic tags
 3. **Atomic**: One concept or decision per chunk - no compound knowledge
 4. **Traceable**: Every chunk links back to its source artifact and section
 5. **Contextual**: Include enough surrounding context for accurate retrieval
 6. **Deduplicated**: Avoid redundant chunks across different source artifacts
 ### Chunk Format
 Each chunk follows this standard format:
 ```markdown
 ### [CHUNK-ID] Chunk Title
 - **Source:** `{relative_path_to_source_file}`
 - **Category:** architecture | requirements | implementation | domain | operations | quality
 - **Priority:** critical | high | standard | reference
 - **Tags:** comma-separated semantic tags for retrieval matching
 **Context:** One-line description of when this knowledge is relevant.
 **Content:**
 The actual knowledge content - specific, actionable, self-contained.
 ```
 ## INDEXING SEQUENCE:
 ### 1. Index Critical-Priority Artifacts
 Process all artifacts marked as `critical` priority first:
 **For each critical artifact:**
 - Read the complete source file
 - Identify distinct knowledge units (decisions, rules, constraints)
 - Create one chunk per knowledge unit
 - Apply semantic tags for retrieval matching
 - Present chunks to user for validation
 **Present results:**
 "I've created {{chunk_count}} critical-priority chunks from {{source_count}} sources:
 {{list_of_chunk_titles_with_tags}}
 These chunks will be prioritized in every retrieval query.
 [A] Advanced Elicitation - Explore deeper knowledge connections
 [P] Party Mode - Review from multiple implementation perspectives
 [C] Continue - Save these chunks and proceed"
 ### 2. Index High-Priority Artifacts
 Process all `high` priority artifacts:
 **For each high-priority artifact:**
 - Read source file and identify knowledge units
 - Create chunks with appropriate tags
 - Cross-reference with critical chunks for consistency
 - Identify any overlaps and deduplicate
 ### 3. Index Standard-Priority Artifacts
 Process `standard` priority artifacts:
 **For each standard artifact:**
 - Read source file for domain-specific knowledge
 - Create chunks focused on contextual information
 - Tag for specific retrieval scenarios
 ### 4. Index Reference-Priority Artifacts
 Process `reference` priority artifacts:
 **For each reference artifact:**
 - Extract background context and terminology
 - Create lighter-weight chunks for supplementary retrieval
 - Tag for broad topic matching
 ### 5. Cross-Reference and Deduplicate
 After all categories are indexed:
 **Deduplication Analysis:**
 - Identify chunks with overlapping content across sources
 - Merge or consolidate redundant chunks
 - Ensure cross-references between related chunks are tagged
 - Present deduplication summary to user
 **Relationship Mapping:**
 - Identify chunks that frequently co-occur in implementation contexts
 - Tag related chunks for retrieval grouping
 - Create chunk clusters for common query patterns
 ### 6. Generate Knowledge Index Document
 Compile all validated chunks into the knowledge index file:
 **Document Structure:**
 ```markdown
 # Knowledge Index for {{project_name}}
 _RAG-optimized knowledge base for AI agent retrieval. Each chunk is self-contained and tagged for semantic search._
 ---
 ## Index Summary
 - **Total Chunks:** {{total_count}}
 - **Critical:** {{critical_count}} | **High:** {{high_count}} | **Standard:** {{standard_count}} | **Reference:** {{ref_count}}
 - **Sources Indexed:** {{source_count}}
 - **Last Synced:** {{date}}
 ---
 ## Critical Knowledge
 {{critical_chunks}}
 ## Architecture Knowledge
 {{architecture_chunks}}
 ## Requirements Knowledge
 {{requirements_chunks}}
 ## Implementation Knowledge
 {{implementation_chunks}}
 ## Domain Knowledge
 {{domain_chunks}}
 ## Operations Knowledge
 {{operations_chunks}}
 ## Quality Knowledge
 {{quality_chunks}}
 ```
 ### 7. Present Indexing Summary
 "Knowledge indexing complete for {{project_name}}!
 **Chunks Created:**
 | Category | Critical | High | Standard | Reference | Total |
 |---|---|---|---|---|---|
 | Architecture | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
 | Requirements | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
 | Implementation | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
 | Domain | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
 | Operations | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
 | Quality | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
 **Deduplication:** Removed {{removed_count}} redundant chunks
 **Cross-References:** {{xref_count}} chunk relationships mapped
 [C] Continue to optimization"
 ## SUCCESS METRICS:
 ✅ All discovered artifacts indexed into self-contained chunks
 ✅ Each chunk has proper metadata tags and source tracing
 ✅ No redundant or overlapping chunks remain
 ✅ Cross-references between related chunks are mapped
 ✅ A/P/C menu presented and handled correctly for each category
 ✅ Knowledge index document properly structured
 ## FAILURE MODES:
 ❌ Creating chunks that require reading the full source document
 ❌ Missing semantic tags that prevent accurate retrieval
 ❌ Not deduplicating overlapping chunks from different sources
 ❌ Not cross-referencing related knowledge units
 ❌ Not getting user validation for each category
 ❌ Creating overly large chunks that reduce retrieval precision
 ## NEXT STEP:
 After completing all categories and user selects [C], load `{project-root}/_bmad/bmm/workflows/4-implementation/genai-knowledge-sync/steps/step-03-optimize.md` to optimize the knowledge base for retrieval quality.
 Remember: Do NOT proceed to step-03 until all categories are indexed and user explicitly selects [C]!
--- a/src/bmm/workflows/4-implementation/genai-knowledge-sync/steps/step-03-optimize.md
+++ b/src/bmm/workflows/4-implementation/genai-knowledge-sync/steps/step-03-optimize.md
@ -0,0 +1,289 @@
 # Step 3: Knowledge Base Optimization & Completion
 ## MANDATORY EXECUTION RULES (READ FIRST):
 - 🛑 NEVER generate content without user input
 - ✅ ALWAYS treat this as collaborative optimization between technical peers
 - 📋 YOU ARE A FACILITATOR, not a content generator
 - 💬 FOCUS on optimizing chunks for retrieval quality and accuracy
 - 🎯 ENSURE every chunk is retrieval-ready and well-tagged
 - ⚠️ ABSOLUTELY NO TIME ESTIMATES - AI development speed has fundamentally changed
 - ✅ YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}`
 ## EXECUTION PROTOCOLS:
 - 🎯 Show your analysis before taking any action
 - 📝 Review and optimize chunks for retrieval precision
 - 📖 Update frontmatter with completion status
 - 🚫 NO MORE STEPS - this is the final step
 ## CONTEXT BOUNDARIES:
 - All knowledge chunks from step-2 are indexed
 - Cross-references and deduplication are complete
 - Focus on retrieval quality optimization and finalization
 - Ensure the knowledge index is ready for RAG pipeline integration
 ## YOUR TASK:
 Optimize the knowledge index for retrieval quality, validate chunk completeness, and finalize the knowledge base for AI agent consumption.
 ## OPTIMIZATION SEQUENCE:
 ### 1. Retrieval Quality Analysis
 Analyze the indexed chunks for retrieval effectiveness:
 **Tag Coverage Analysis:**
 - Review semantic tags across all chunks
 - Identify gaps where common queries would miss relevant chunks
 - Suggest additional tags for better retrieval matching
 - Present tag coverage report to user
 **Chunk Size Analysis:**
 - Identify chunks that are too large (reduce retrieval precision)
 - Identify chunks that are too small (lack sufficient context)
 - Recommend splits or merges for optimal retrieval size
 - Target: Each chunk should be 100-500 words for optimal embedding
 **Context Sufficiency Check:**
 - Verify each chunk is understandable without its source document
 - Add missing context where chunks reference undefined terms
 - Ensure technical terms are defined or tagged for glossary lookup
 ### 2. Semantic Tag Optimization
 Optimize tags for retrieval accuracy:
 **Tag Standardization:**
 - Normalize similar tags (e.g., "api-design" and "api-patterns" → single standard)
 - Create a tag vocabulary for the project
 - Apply consistent tag format across all chunks
 **Tag Enrichment:**
 - Add technology-specific tags (framework names, library names)
 - Add pattern-type tags (e.g., "error-handling", "state-management")
 - Add lifecycle tags (e.g., "setup", "implementation", "testing", "deployment")
 **Present Tag Summary:**
 "I've optimized the semantic tags across {{chunk_count}} chunks:
 **Tag Vocabulary:** {{unique_tag_count}} standardized tags
 **Most Connected Tags:** {{top_tags_by_frequency}}
 **Coverage Gaps Fixed:** {{gaps_fixed}}
 Would you like to review the tag vocabulary? (y/n)"
 ### 3. Retrieval Scenario Testing
 Validate retrieval quality with common query scenarios:
 **Test Queries:**
 Simulate these common developer queries against the knowledge index:
 1. "How should I structure a new feature?" → Should retrieve: architecture + implementation chunks
 2. "What are the testing requirements?" → Should retrieve: quality + implementation chunks
 3. "What technology versions are we using?" → Should retrieve: critical implementation chunks
 4. "How do I handle errors in this project?" → Should retrieve: implementation + quality chunks
 5. "What are the business rules for {{core_feature}}?" → Should retrieve: requirements + domain chunks
 **For each query, report:**
 - Chunks that would be retrieved (by tag matching)
 - Missing chunks that should be retrieved but aren't
 - False positive chunks that would be retrieved incorrectly
 - Recommended tag adjustments
 ### 4. Generate Retrieval Configuration
 Create a retrieval configuration section in the knowledge index:
 ```markdown
 ## Retrieval Configuration
 ### Query Mapping
 | Query Pattern | Target Categories | Priority Filter | Expected Chunks |
 |---|---|---|---|
 | "how to implement *" | implementation, architecture | critical, high | 3-5 |
 | "testing requirements for *" | quality, implementation | critical, high | 2-4 |
 | "business rules for *" | requirements, domain | all | 2-3 |
 | "architecture decision for *" | architecture | all | 1-3 |
 | "deployment process for *" | operations | all | 1-2 |
 ### Embedding Recommendations
 - **Model:** Use an embedding model that handles technical content well
 - **Chunk Overlap:** 50-100 characters overlap between adjacent chunks from the same source
 - **Metadata Filters:** Always filter by category and priority for focused retrieval
 - **Top-K:** Retrieve 3-5 chunks per query for optimal context balance
 ```
 ### 5. Finalize Knowledge Index
 Complete the knowledge index with optimization results:
 **Update Frontmatter:**
 ```yaml
 ---
 project_name: '{{project_name}}'
 user_name: '{{user_name}}'
 date: '{{date}}'
 total_chunks: {{total_count}}
 sources_indexed: {{source_count}}
 tag_vocabulary_size: {{tag_count}}
 retrieval_tested: true
 status: 'complete'
 ---
 ```
 **Append Usage Guidelines:**
 ```markdown
 ---
 ## Usage Guidelines
 **For AI Agents (RAG Retrieval):**
 - Query this index using semantic search against chunk tags and content
 - Always include critical-priority chunks in implementation context
 - Filter by category when the task type is known
 - Cross-reference related chunks using shared tags
 **For Humans (Maintenance):**
 - Re-run this workflow when new artifacts are created or significantly updated
 - Add new chunks manually using the standard chunk format above
 - Review and prune quarterly to remove outdated knowledge
 - Update tags when new patterns or technologies are adopted
 **For RAG Pipeline Integration:**
 - Parse chunks by the `### [CHUNK-ID]` delimiter
 - Extract metadata from the bullet-point headers (Source, Category, Priority, Tags)
 - Use Tags field for semantic search indexing
 - Use Priority field for retrieval ranking
 ```
 ### 6. Present Completion Summary
 Based on user skill level, present the completion:
 **Expert Mode:**
 "Knowledge index complete. {{chunk_count}} chunks across {{source_count}} sources, {{tag_count}} semantic tags. Retrieval-tested and RAG-ready.
 File saved to: `{project_knowledge}/knowledge-index.md`"
 **Intermediate Mode:**
 "Your project knowledge base is indexed and retrieval-ready!
 **What we created:**
 - {{chunk_count}} self-contained knowledge chunks
 - {{source_count}} source artifacts indexed
 - {{tag_count}} semantic tags for retrieval matching
 - Retrieval configuration for RAG pipeline integration
 **How it works:**
 AI agents can now search this index to find exactly the project knowledge they need for any implementation task, instead of loading entire documents.
 **Next steps:**
 - Integrate with your RAG pipeline using the retrieval configuration
 - Re-run this workflow when artifacts change significantly
 - Review quarterly to keep knowledge current"
 **Beginner Mode:**
 "Your project knowledge base is ready! 🎉
 **What this does:**
 Think of this as a smart library catalog for your project. Instead of AI agents reading every document from start to finish, they can now search for exactly the knowledge they need.
 **What's included:**
 - {{chunk_count}} bite-sized knowledge pieces from your project documents
 - Smart tags so agents can find the right knowledge quickly
 - Priority labels so the most important knowledge comes first
 **How AI agents use it:**
 When an agent needs to implement something, it searches this index for relevant knowledge chunks instead of reading entire documents. This makes them faster and more accurate!"
 ### 7. Completion Validation
 Final checks before completion:
 **Content Validation:**
 ✅ All discovered artifacts indexed into chunks
 ✅ Each chunk has proper metadata and source tracing
 ✅ Semantic tags are standardized and comprehensive
 ✅ No redundant chunks remain after deduplication
 ✅ Retrieval scenarios tested successfully
 ✅ Retrieval configuration generated
 **Format Validation:**
 ✅ Consistent chunk format throughout
 ✅ Frontmatter properly updated
 ✅ Tag vocabulary is standardized
 ✅ Document is well-structured and scannable
 ### 8. Completion Message
 "✅ **GenAI Knowledge Sync Complete!**
 Your retrieval-optimized knowledge index is ready at:
 `{project_knowledge}/knowledge-index.md`
 **📊 Knowledge Base Summary:**
 - {{chunk_count}} indexed knowledge chunks
 - {{source_count}} source artifacts cataloged
 - {{tag_count}} semantic tags for retrieval
 - {{category_count}} knowledge categories covered
 - Retrieval-tested with {{test_count}} query scenarios
 **🎯 RAG Integration Ready:**
 - Self-contained chunks with metadata headers
 - Standardized tag vocabulary for semantic search
 - Priority-based retrieval ranking
 - Query mapping configuration included
 **📋 Maintenance:**
 1. Re-sync when artifacts change significantly: run this workflow again
 2. Add individual chunks manually using the standard format
 3. Review quarterly to prune outdated knowledge
 4. Update tags when new patterns emerge
 Your AI agents can now retrieve precisely the project knowledge they need for any task!"
 ## SUCCESS METRICS:
 ✅ Knowledge index fully optimized for retrieval quality
 ✅ Semantic tags standardized and comprehensive
 ✅ Retrieval scenarios tested with good coverage
 ✅ Retrieval configuration generated for RAG pipeline
 ✅ Usage guidelines included for agents, humans, and pipelines
 ✅ Frontmatter properly updated with completion status
 ✅ User provided with clear maintenance guidance
 ## FAILURE MODES:
 ❌ Chunks too large or too small for effective retrieval
 ❌ Semantic tags inconsistent or too sparse
 ❌ Not testing retrieval scenarios before finalizing
 ❌ Missing retrieval configuration for pipeline integration
 ❌ Not providing maintenance and usage guidelines
 ❌ Frontmatter not properly updated
 ## WORKFLOW COMPLETE:
 This is the final step of the GenAI Knowledge Sync workflow. The user now has a retrieval-optimized knowledge index that enables AI agents to find and use exactly the project knowledge they need for any implementation task, improving both speed and accuracy of AI-assisted development.
--- a/src/bmm/workflows/4-implementation/genai-knowledge-sync/workflow.md
+++ b/src/bmm/workflows/4-implementation/genai-knowledge-sync/workflow.md
@ -0,0 +1,50 @@
 ---
 name: genai-knowledge-sync
 description: 'Build and maintain a RAG-ready knowledge base from project artifacts. Use when the user says "build knowledge base", "sync knowledge", or "create RAG context"'
 ---
 # GenAI Knowledge Sync Workflow
 **Goal:** Create a structured, chunked knowledge index (`knowledge-index.md`) from project artifacts that is optimized for Retrieval-Augmented Generation (RAG) pipelines and AI agent context loading. This enables AI agents to retrieve the most relevant project knowledge at inference time rather than loading entire documents.
 **Your Role:** You are a technical knowledge architect working with a peer to catalog, chunk, and index project artifacts into a retrieval-optimized format. You ensure every knowledge chunk is self-contained, well-tagged, and traceable to its source.
 ---
 ## WORKFLOW ARCHITECTURE
 This uses **micro-file architecture** for disciplined execution:
 - Each step is a self-contained file with embedded rules
 - Sequential progression with user control at each step
 - Document state tracked in frontmatter
 - Focus on lean, retrieval-optimized content generation
 - You NEVER proceed to a step file if the current step file indicates the user must approve and indicate continuation.
 ---
 ## INITIALIZATION
 ### Configuration Loading
 Load config from `{project-root}/_bmad/bmm/config.yaml` and resolve:
 - `project_name`, `output_folder`, `user_name`
 - `communication_language`, `document_output_language`, `user_skill_level`
 - `planning_artifacts`, `implementation_artifacts`, `project_knowledge`
 - `date` as system-generated current datetime
 - ✅ YOU MUST ALWAYS SPEAK OUTPUT In your Agent communication style with the config `{communication_language}`
 ### Paths
 - `installed_path` = `{project-root}/_bmad/bmm/workflows/4-implementation/genai-knowledge-sync`
 - `template_path` = `{installed_path}/knowledge-index-template.md`
 - `output_file` = `{project_knowledge}/knowledge-index.md`
 ---
 ## EXECUTION
 Load and execute `{project-root}/_bmad/bmm/workflows/4-implementation/genai-knowledge-sync/steps/step-01-discover.md` to begin the workflow.
 **Note:** Artifact discovery, source cataloging, and chunking strategy selection are handled in step-01-discover.md.
--- a/tools/cli/commands/status.js
+++ b/tools/cli/commands/status.js
@ -11,7 +11,7 @@ const ui = new UI();
 module.exports = {
  command: 'status',
  description: 'Display BMAD installation status and module versions',
-  options: [],
+  options: [['-v, --verbose', 'Show detailed status including agent and workflow counts']],
  action: async (options) => {
    try {
      // Find the bmad directory
@ -53,6 +53,23 @@ module.exports = {
        bmadDir,
      });
      // Verbose mode: show agent and workflow counts per module
      if (options.verbose) {
        const { glob } = require('glob');
        for (const mod of modules) {
          const moduleName = typeof mod === 'string' ? mod : (mod.id || mod.name || '');
          if (!moduleName) continue;
          const modDir = path.join(bmadDir, moduleName);
          if (!(await fs.pathExists(modDir))) continue;
          const agents = await glob('agents/**/*.agent.yaml', { cwd: modDir });
          const workflows = await glob('workflows/**/*.{yaml,yml,md}', { cwd: modDir });
          await prompts.log.info(`Module "${moduleName}": ${agents.length} agent(s), ${workflows.length} workflow(s)`);
        }
      }
      process.exit(0);
    } catch (error) {
      await prompts.log.error(`Status check failed: ${error.message}`);
--- a/tools/cli/lib/config.js
+++ b/tools/cli/lib/config.js
@ -7,8 +7,14 @@ const packageJson = require('../../../package.json');
 * Configuration utility class
 */
 class Config {
  /** @type {Map<string, { data: Object, mtime: number }>} */
  #cache = new Map();
  /**
-   * Load a YAML configuration file
+   * Load a YAML configuration file with in-memory caching.
   * Cached entries are automatically invalidated when the file's
   * modification time changes, so callers always receive fresh data
   * after a file is written.
   * @param {string} configPath - Path to config file
   * @returns {Object} Parsed configuration
   */
@ -17,8 +23,26 @@ class Config {
      throw new Error(`Configuration file not found: ${configPath}`);
    }
-    const content = await fs.readFile(configPath, 'utf8');
+    const resolved = path.resolve(configPath);
-    return yaml.parse(content);
+    const stat = await fs.stat(resolved);
    const mtime = stat.mtimeMs;
    const cached = this.#cache.get(resolved);
    if (cached && cached.mtime === mtime) {
      return cached.data;
    }
    const content = await fs.readFile(resolved, 'utf8');
    const data = yaml.parse(content);
    this.#cache.set(resolved, { data, mtime });
    return data;
  }
  /**
   * Clear the in-memory YAML cache.
   */
  clearCache() {
    this.#cache.clear();
  }
  /**
Author	SHA1	Message	Date
Marcus Bergo	abbee318ff	Merge `2e1949df76` into `9536e1e6e3`	2026-03-04 11:25:09 -03:00
Alex Verkhovsky	9536e1e6e3	feat(skills): add bmad-os-review-prompt skill (#1806 ) PromptSentinel v1.2 - reviews LLM workflow step prompts for known failure modes including silent ignoring, negation fragility, scope creep, and 14 other catalog items. Uses parallel review tracks (adversarial, catalog scan, path tracing) with structured output.	2026-03-03 22:38:58 -06:00
Alex Verkhovsky	7ece8b09fa	feat(skills): add bmad-os-findings-triage HITL triage skill (#1804 ) Team-based skill that orchestrates human-in-the-loop triage of review findings using parallel Opus agents. One agent per finding researches autonomously, proposes a plan, then holds for human conversation before a decision is recorded. Team lead maintains scorecard and lifecycle. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 22:08:55 -06:00
Marcus Bergo	2e1949df76	Add contribution rules to instructions	2026-02-26 14:30:28 -03:00