fix(core): resolve contradictions in edge case hunter task spec

- Show array wrapper [{}] in output-format example to match JSON array contract, and document empty array [] as valid output - Consolidate empty-content handling: step 1 now defers to halt-conditions instead of defining separate "ask and abort" behavior - Zero-findings halt no longer contradicts JSON contract: re-analyze once, then return [] instead of ambiguous "HALT or re-analyze" - Soften "Execute ALL steps" to acknowledge halt-conditions can interrupt
feat(skills): add edge case hunter as parallel review layer in PR review
2026-02-27 22:22:03 -07:00 · 2026-02-27 22:21:57 -07:00
3 changed files with 70 additions and 13 deletions
--- a/.claude/skills/bmad-os-review-pr/SKILL.md
+++ b/.claude/skills/bmad-os-review-pr/SKILL.md
@ -1,6 +1,6 @@
 ---
 name: bmad-os-review-pr
-description: Adversarial PR review tool (Raven's Verdict). Cynical deep review transformed into professional engineering findings. Use when user asks to 'review a PR' and provides a PR url or id.
+description: Dual-layer PR review tool (Raven's Verdict). Runs adversarial cynical review and edge case hunter in parallel, merges and deduplicates findings into professional engineering output. Use when user asks to 'review a PR' and provides a PR url or id.
 ---
 Read `prompts/instructions.md` and execute.
--- a/.claude/skills/bmad-os-review-pr/prompts/instructions.md
+++ b/.claude/skills/bmad-os-review-pr/prompts/instructions.md
@ -93,13 +93,17 @@ gh pr diff {PR_NUMBER} [--repo {REPO}] --name-only | grep -E '\.(png|jpg|jpeg|gi
 Store list of binary files to skip. Note them in final output.
-## Adversarial Review
+## Review Layers
-### 1.1 Run Cynical Review
+**Launch steps 1.1 and 1.2 as parallel subagents** using the Task tool. Both receive the same PR diff and run concurrently. Wait for both to complete before proceeding to step 1.3.
 ### 1.1 Run Cynical Review (subagent)
 Spawn a subagent with the following prompt. Pass the full PR diff as context.
 **INTERNAL PERSONA - Never post this directly:**
-Task: You are a cynical, jaded code reviewer with zero patience for sloppy work. This PR was submitted by a clueless weasel and you expect to find problems. Find at least five issues to fix or improve in it. Number them. Be skeptical of everything. Ultrathink.
+Task: You are a cynical, jaded code reviewer with zero patience for sloppy work. This PR was submitted by a clueless weasel and you expect to find problems. Find at least five issues to fix or improve in it. Number them. Be skeptical of everything.
 Output format:
@ -124,14 +128,64 @@ Likely tag:
 - Add `[likely]` to findings with high confidence, e.g. with direct evidence
 - Sort findings by severity (Critical → Moderate → Minor), not by confidence
 ### 1.2 Run Edge Case Hunter (subagent)
 Spawn a subagent that executes the task defined in `_bmad/core/tasks/review-edge-case-hunter.xml`. Pass the full PR diff as the `content` input. Omit `also_consider` unless the user specified extra focus areas.
 The task returns a JSON array of objects, each with: `location`, `trigger_condition`, `guard_snippet`, `potential_consequence`.
 **Map each JSON finding to the standard finding format:**
 ```markdown
 ### [NUMBER]. [trigger_condition] [likely]
 **Severity:** [INFERRED_EMOJI] [INFERRED_LEVEL]
 **`[location]`** — [trigger_condition]. [potential_consequence].
 **Suggested fix:**
 ```
 [guard_snippet]
 ```
 ```
 Severity inference rules for edge case findings:
 - **Critical** — data loss, security, or crash conditions (null deref, unhandled throw, auth bypass)
 - **Moderate** — logic errors, silent wrong results, race conditions
 - **Minor** — cosmetic edge cases, unlikely boundary conditions
 Add `[likely]` to all edge case findings — they are derived from mechanical path tracing, so confidence is inherently high.
 If the edge case hunter returns zero findings or halts, note it internally and proceed — step 1.1 findings still stand.
 ### 1.3 Merge and Deduplicate
 Combine the findings from step 1.1 (adversarial) and step 1.2 (edge case hunter) into a single list.
 **Deduplication rules:**
 1. Compare each edge case finding against each adversarial finding
 2. Two findings are duplicates if they reference the same file location AND describe the same gap (use description similarity — same function/variable/condition mentioned)
 3. When a duplicate is found, keep the version with more specificity (usually the edge case hunter's, since it includes `guard_snippet`)
 4. Mark the kept finding with the source that produced it
 **After dedup, renumber all findings sequentially and sort by severity (Critical → Moderate → Minor).**
 Tag each finding with its source:
 - `[Adversarial]` — from step 1.1 only
 - `[Edge Case]` — from step 1.2 only
 - `[Both]` — flagged by both layers (deduped)
 ## Tone Transformation
-**Transform the cynical output into cold engineering professionalism.**
+**Transform the merged findings into cold engineering professionalism.**
 **Transformation rules:**
 1. Remove all inflammatory language, insults, assumptions about the author
-2. Keep all technical substance, file references, severity ratings and likely tag
+2. Keep all technical substance, file references, severity ratings, likely tag, and **source tags**
 3. Replace accusatory phrasing with neutral observations:
   - ❌ "The author clearly didn't think about..."
   - ✅ "This implementation may not account for..."
@ -140,6 +194,7 @@ Likely tag:
   - ✅ "This pattern has historically caused issues in production environments"
 5. Add the suggested fixes.
 6. Keep suggestions actionable and specific
 7. Edge case hunter findings need no persona cleanup, but still apply professional formatting consistently
 Output format after transformation:
@ -149,18 +204,20 @@ Output format after transformation:
 **Title:** {PR_TITLE}
 **Author:** @{AUTHOR}
 **Branch:** {HEAD} → {BASE}
 **Review layers:** Adversarial + Edge Case Hunter
 ---
 ### Findings
-[TRANSFORMED FINDINGS HERE]
+[TRANSFORMED FINDINGS HERE — each tagged with source]
 ---
 ### Summary
 **Critical:** {COUNT} | **Moderate:** {COUNT} | **Minor:** {COUNT}
 **Sources:** {ADVERSARIAL_COUNT} adversarial | {EDGE_CASE_COUNT} edge case | {BOTH_COUNT} both
 [BINARY_FILES_NOTE if any]
--- a/src/core/tasks/review-edge-case-hunter.xml
+++ b/src/core/tasks/review-edge-case-hunter.xml
@ -15,16 +15,16 @@ Ignore the rest of the codebase unless the provided content explicitly reference
  </inputs>
  <output-format>Return ONLY a valid JSON array of objects. Each object must contain exactly these four fields and nothing else:
-{
+[{
  "location": "file:line",
  "trigger_condition": "one-line description (max 15 words)",
  "guard_snippet": "minimal code sketch that closes the gap",
  "potential_consequence": "what could actually go wrong (max 15 words)"
-}
+}]
-No extra text, no explanations, no markdown wrapping.</output-format>
+No extra text, no explanations, no markdown wrapping. An empty array [] is valid when no unhandled paths are found.</output-format>
  <llm critical="true">
-    <i>MANDATORY: Execute ALL steps in the flow section IN EXACT ORDER</i>
+    <i>MANDATORY: Execute steps in the flow section IN EXACT ORDER unless a halt-condition triggers</i>
    <i>DO NOT skip steps or change the sequence</i>
    <i>HALT immediately when halt-conditions are met</i>
    <i>Each action xml tag within step xml tag is a REQUIRED action to complete that step</i>
@ -39,7 +39,7 @@ No extra text, no explanations, no markdown wrapping.</output-format>
  <flow>
    <step n="1" title="Receive Content">
      <action>Load the content to review from provided input or context</action>
-      <action>If content to review is empty, ask for clarification and abort task</action>
+      <action>If content to review is empty or unreadable, HALT per halt-conditions</action>
      <action>Identify content type (diff, full file, or function) to determine scope rules</action>
    </step>
@ -57,7 +57,7 @@ No extra text, no explanations, no markdown wrapping.</output-format>
  </flow>
  <halt-conditions>
-    <condition>HALT if zero findings - this is suspicious, re-analyze or ask for guidance</condition>
+    <condition>If zero findings after initial analysis, re-analyze once with increased scrutiny. If still zero, return empty array [].</condition>
    <condition>HALT if content is empty or unreadable</condition>
  </halt-conditions>