From f7846fd5ebd2aa81c7c79c0b3a327c56988d6abf Mon Sep 17 00:00:00 2001 From: Alex Verkhovsky Date: Fri, 6 Mar 2026 19:21:28 -0700 Subject: [PATCH] feat(skills): add edge case hunter as parallel review layer in PR review (#1791) * feat(skills): add edge case hunter as parallel review layer in PR review Wire review-edge-case-hunter.xml into bmad-os-review-pr as a second review layer running in parallel with the adversarial review. Both subagents receive the same PR diff concurrently. Findings are merged, deduplicated, and tagged by source before tone transformation. * fix(core): resolve contradictions in edge case hunter task spec - Show array wrapper [{}] in output-format example to match JSON array contract, and document empty array [] as valid output - Consolidate empty-content handling: step 1 now defers to halt-conditions instead of defining separate "ask and abort" behavior - Zero-findings halt no longer contradicts JSON contract: re-analyze once, then return [] instead of ambiguous "HALT or re-analyze" - Soften "Execute ALL steps" to acknowledge halt-conditions can interrupt * fix(review): address PR #1791 review feedback - Remove "Task tool" reference per maintainer; use generic "subagents" - Fix nested triple-backtick fencing with four-tick outer fence - Widen location format to support multi-line ranges and hunk refs - Add JSON-safety constraint to guard_snippet field - Tighten input loading to "strictly from provided input" - Replace vague "unreadable" with "cannot be decoded as text" - Replace vague "increased scrutiny" with concrete re-analysis checklist - Resolve HALT-immediately vs re-analysis conflict in LLM instructions Co-Authored-By: Claude Opus 4.6 --------- Co-authored-by: Claude Opus 4.6 --- .claude/skills/bmad-os-review-pr/SKILL.md | 2 +- .../bmad-os-review-pr/prompts/instructions.md | 69 +++++++++++++++++-- src/core/tasks/review-edge-case-hunter.xml | 29 +++++--- 3 files changed, 82 insertions(+), 18 deletions(-) diff --git a/.claude/skills/bmad-os-review-pr/SKILL.md b/.claude/skills/bmad-os-review-pr/SKILL.md index 67bb05bd5..8adc6d031 100644 --- a/.claude/skills/bmad-os-review-pr/SKILL.md +++ b/.claude/skills/bmad-os-review-pr/SKILL.md @@ -1,6 +1,6 @@ --- name: bmad-os-review-pr -description: Adversarial PR review tool (Raven's Verdict). Cynical deep review transformed into professional engineering findings. Use when user asks to 'review a PR' and provides a PR url or id. +description: Dual-layer PR review tool (Raven's Verdict). Runs adversarial cynical review and edge case hunter in parallel, merges and deduplicates findings into professional engineering output. Use when user asks to 'review a PR' and provides a PR url or id. --- Read `prompts/instructions.md` and execute. diff --git a/.claude/skills/bmad-os-review-pr/prompts/instructions.md b/.claude/skills/bmad-os-review-pr/prompts/instructions.md index 12d150049..74512128e 100644 --- a/.claude/skills/bmad-os-review-pr/prompts/instructions.md +++ b/.claude/skills/bmad-os-review-pr/prompts/instructions.md @@ -93,13 +93,17 @@ gh pr diff {PR_NUMBER} [--repo {REPO}] --name-only | grep -E '\.(png|jpg|jpeg|gi Store list of binary files to skip. Note them in final output. -## Adversarial Review +## Review Layers -### 1.1 Run Cynical Review +**Launch steps 1.1 and 1.2 as parallel subagents.** Both receive the same PR diff and run concurrently. Wait for both to complete before proceeding to step 1.3. + +### 1.1 Run Cynical Review (subagent) + +Spawn a subagent with the following prompt. Pass the full PR diff as context. **INTERNAL PERSONA - Never post this directly:** -Task: You are a cynical, jaded code reviewer with zero patience for sloppy work. This PR was submitted by a clueless weasel and you expect to find problems. Find at least five issues to fix or improve in it. Number them. Be skeptical of everything. Ultrathink. +Task: You are a cynical, jaded code reviewer with zero patience for sloppy work. This PR was submitted by a clueless weasel and you expect to find problems. Find at least five issues to fix or improve in it. Number them. Be skeptical of everything. Output format: @@ -124,14 +128,64 @@ Likely tag: - Add `[likely]` to findings with high confidence, e.g. with direct evidence - Sort findings by severity (Critical → Moderate → Minor), not by confidence +### 1.2 Run Edge Case Hunter (subagent) + +Spawn a subagent that executes the task defined in `_bmad/core/tasks/review-edge-case-hunter.xml`. Pass the full PR diff as the `content` input. Omit `also_consider` unless the user specified extra focus areas. + +The task returns a JSON array of objects, each with: `location`, `trigger_condition`, `guard_snippet`, `potential_consequence`. + +**Map each JSON finding to the standard finding format:** + +````markdown +### [NUMBER]. [trigger_condition] [likely] + +**Severity:** [INFERRED_EMOJI] [INFERRED_LEVEL] + +**`[location]`** — [trigger_condition]. [potential_consequence]. + +**Suggested fix:** +``` +[guard_snippet] +``` +```` + +Severity inference rules for edge case findings: + +- **Critical** — data loss, security, or crash conditions (null deref, unhandled throw, auth bypass) +- **Moderate** — logic errors, silent wrong results, race conditions +- **Minor** — cosmetic edge cases, unlikely boundary conditions + +Add `[likely]` to all edge case findings — they are derived from mechanical path tracing, so confidence is inherently high. + +If the edge case hunter returns zero findings or halts, note it internally and proceed — step 1.1 findings still stand. + +### 1.3 Merge and Deduplicate + +Combine the findings from step 1.1 (adversarial) and step 1.2 (edge case hunter) into a single list. + +**Deduplication rules:** + +1. Compare each edge case finding against each adversarial finding +2. Two findings are duplicates if they reference the same file location AND describe the same gap (use description similarity — same function/variable/condition mentioned) +3. When a duplicate is found, keep the version with more specificity (usually the edge case hunter's, since it includes `guard_snippet`) +4. Mark the kept finding with the source that produced it + +**After dedup, renumber all findings sequentially and sort by severity (Critical → Moderate → Minor).** + +Tag each finding with its source: + +- `[Adversarial]` — from step 1.1 only +- `[Edge Case]` — from step 1.2 only +- `[Both]` — flagged by both layers (deduped) + ## Tone Transformation -**Transform the cynical output into cold engineering professionalism.** +**Transform the merged findings into cold engineering professionalism.** **Transformation rules:** 1. Remove all inflammatory language, insults, assumptions about the author -2. Keep all technical substance, file references, severity ratings and likely tag +2. Keep all technical substance, file references, severity ratings, likely tag, and **source tags** 3. Replace accusatory phrasing with neutral observations: - ❌ "The author clearly didn't think about..." - ✅ "This implementation may not account for..." @@ -140,6 +194,7 @@ Likely tag: - ✅ "This pattern has historically caused issues in production environments" 5. Add the suggested fixes. 6. Keep suggestions actionable and specific +7. Edge case hunter findings need no persona cleanup, but still apply professional formatting consistently Output format after transformation: @@ -149,18 +204,20 @@ Output format after transformation: **Title:** {PR_TITLE} **Author:** @{AUTHOR} **Branch:** {HEAD} → {BASE} +**Review layers:** Adversarial + Edge Case Hunter --- ### Findings -[TRANSFORMED FINDINGS HERE] +[TRANSFORMED FINDINGS HERE — each tagged with source] --- ### Summary **Critical:** {COUNT} | **Moderate:** {COUNT} | **Minor:** {COUNT} +**Sources:** {ADVERSARIAL_COUNT} adversarial | {EDGE_CASE_COUNT} edge case | {BOTH_COUNT} both [BINARY_FILES_NOTE if any] diff --git a/src/core/tasks/review-edge-case-hunter.xml b/src/core/tasks/review-edge-case-hunter.xml index f312e1b2e..dfe75ce34 100644 --- a/src/core/tasks/review-edge-case-hunter.xml +++ b/src/core/tasks/review-edge-case-hunter.xml @@ -15,18 +15,18 @@ Ignore the rest of the codebase unless the provided content explicitly reference Return ONLY a valid JSON array of objects. Each object must contain exactly these four fields and nothing else: -{ - "location": "file:line", +[{ + "location": "file:start-end (or file:line when single line, or file:hunk when exact line unavailable)", "trigger_condition": "one-line description (max 15 words)", - "guard_snippet": "minimal code sketch that closes the gap", + "guard_snippet": "minimal code sketch that closes the gap (single-line escaped string, no raw newlines or unescaped quotes)", "potential_consequence": "what could actually go wrong (max 15 words)" -} -No extra text, no explanations, no markdown wrapping. +}] +No extra text, no explanations, no markdown wrapping. An empty array [] is valid when no unhandled paths are found. - MANDATORY: Execute ALL steps in the flow section IN EXACT ORDER + MANDATORY: Execute steps in the flow section IN EXACT ORDER DO NOT skip steps or change the sequence - HALT immediately when halt-conditions are met + When a halt-condition triggers, follow its specific instruction exactly Each action xml tag within step xml tag is a REQUIRED action to complete that step Your method is exhaustive path enumeration — mechanically walk every branch, not hunt by intuition @@ -38,8 +38,8 @@ No extra text, no explanations, no markdown wrapping. - Load the content to review from provided input or context - If content to review is empty, ask for clarification and abort task + Load the content to review strictly from provided input + If content is empty, or cannot be decoded as text, return empty array [] and stop Identify content type (diff, full file, or function) to determine scope rules @@ -51,13 +51,20 @@ No extra text, no explanations, no markdown wrapping. Collect only the unhandled paths as findings - discard handled ones silently - + + Recheck every conditional for missing else/default + Recheck every input for null/empty/wrong-type + Recheck loop bounds for off-by-one and empty-collection + Add any newly found unhandled paths to findings; discard confirmed-handled ones + + + Output findings as a JSON array following the output-format specification exactly - HALT if content is empty or unreadable + If content is empty or cannot be decoded as text, return empty array [] and stop