feat: add Raven's Verdict PR review tool

Adversarial code review for GitHub PRs with: - Cynical internal review phase, professional external output - Severity-based findings with [likely] tag for high-confidence issues - LLM disclaimer in output footer - Safety features: sandboxed execution, large PR warnings, preview before post
2025-12-06 21:49:24 -07:00 · 2025-12-06 21:49:24 -07:00 · 70e597aa83
parent 80a90c01d4
commit 70e597aa83
4 changed files with 442 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -54,6 +54,9 @@ flattened-codebase.xml
 #UAT template testing output files
 tools/template-test-generator/test-scenarios/

+# Maintainer tools output
+tools/maintainer/pr-review/output/
+
 # Bundler temporary files and generated bundles
 .bundler-temp/

--- a/docs/sprint-artifacts/tech-spec-pr-review-tool.md
+++ b/docs/sprint-artifacts/tech-spec-pr-review-tool.md
@ -0,0 +1,183 @@
+# Tech-Spec: PR Review Tool (Raven's Verdict)
+
+**Created:** 2025-12-06
+**Status:** Completed
+
+## Overview
+
+### Problem Statement
+
+External contributors submit PRs to the upstream repository without context on code quality expectations. Maintainers need a way to provide deep, thorough code review feedback without spending hours manually reviewing every PR. Automated tools like CodeRabbit handle surface-level checks, but high-compute, human-triggered deep reviews are missing.
+
+### Solution
+
+A portable prompt file that any LLM agent can execute to:
+
+1. Fetch PR diff and full files via `gh` CLI
+2. Run an adversarial code review (cynical, thorough)
+3. Transform the tone from "cynical asshole" to "cold engineering professional"
+4. Post the findings as a comment on the PR
+
+### Scope
+
+**In Scope:**
+
+- Manual trigger via any LLM agent (Claude Code, Cursor, Windsurf, etc.)
+- Review of GitHub PRs using `gh` CLI
+- Adversarial review with severity + confidence ratings
+- Tone transformation before posting
+- Preview and explicit confirmation before posting
+
+**Out of Scope:**
+
+- Automated triggers (webhooks, GitHub Actions)
+- Integration with CodeRabbit or other tools
+- Review of non-GitHub repositories
+- Persistent storage or history tracking
+
+## Context for Development
+
+### Codebase Patterns
+
+- Maintainer tools live in `tools/maintainer/`
+- Prompts are simple markdown files with clear instructions
+- Existing pattern: `review-adversarial.md` (3 lines, direct, effective)
+
+### Files to Reference
+
+- `tools/maintainer/fix-elicitation-wording.md` - Example of agent prompt in maintainer tools
+- `.claude/commands/review-adversarial.md` - Base cynical reviewer prompt to adapt
+
+### Technical Decisions
+
+| Decision       | Choice                                    | Rationale                                       |
+| -------------- | ----------------------------------------- | ----------------------------------------------- |
+| Location       | `tools/maintainer/pr-review/`             | Maintainer tooling, separate from product       |
+| Invocation     | Prompt file + PR URL/number               | Portable across all LLM platforms               |
+| PR data source | `gh` CLI                                  | Already available, handles auth                 |
+| Review input   | Diff + full files                         | Diff for focus, full files for tangents         |
+| Tone transform | Same session, Phase 2 with `task:` prefix | Spawns sub-agent if available, inline otherwise |
+| Output format  | Numbered, freeform, severity + confidence | Scannable, actionable                           |
+
+## Implementation Plan
+
+### Tasks
+
+- [x] Task 1: Create `tools/maintainer/pr-review/` directory structure
+- [x] Task 2: Write `review-prompt.md` - the main prompt file with all phases
+- [x] Task 3: Write `README.md` - usage instructions for maintainers
+- [ ] Task 4: Test with a real PR on the upstream repo
+- [ ] Task 5: Iterate based on output quality
+
+### File Structure
+
+```
+tools/maintainer/pr-review/
+  ├── README.md           # How to use
+  ├── review-prompt.md    # The main prompt file
+  └── output/             # Local backup folder (gitignored)
+```
+
+### Prompt File Structure (`review-prompt.md`)
+
+```
+## Phase 0: Pre-flight Checks
+- Verify PR number/URL provided (if not, STOP and ask)
+- Check PR size via gh pr view --json
+- Confirm repo if different from upstream
+- Note binary files to skip
+
+## Phase 1: Adversarial Review
+- Fetch diff + full files
+- Run cynical review
+- Output numbered findings with severity + confidence
+
+## Phase 2: Tone Transform
+- task: Rewrite findings as cold engineering professional
+- Preserve severity/confidence markers
+- Remove inflammatory language, keep substance
+
+## Phase 3: Post
+- Preview full comment
+- Ask for explicit confirmation
+- Post via gh pr comment
+- Handle auth failure gracefully
+```
+
+### Acceptance Criteria
+
+- [ ] AC 1: Given a PR URL, when agent reads prompt, then it fetches PR data via `gh` without hallucinating PR numbers
+- [ ] AC 2: Given PR data, when review runs, then findings are numbered with severity (🔴🟡🟢) and confidence (High/Medium/Low %)
+- [ ] AC 3: Given cynical output, when tone transform runs, then language is professional but findings retain substance
+- [ ] AC 4: Given transformed output, when user confirms, then comment posts to PR via `gh pr comment`
+- [ ] AC 5: Given missing PR number, when agent starts, then it stops and asks user explicitly
+- [ ] AC 6: Given PR from different repo, when agent detects mismatch, then it asks user to confirm before proceeding
+- [ ] AC 7: Given PR with >50 files or >5000 lines, when pre-flight runs, then agent warns and asks to proceed or focus
+- [ ] AC 8: Given auth failure during post, when error occurs, then review is saved locally and error is displayed loudly
+- [ ] AC 9: Given PR with binary files, when fetching diff, then binaries are skipped with a note
+
+## Additional Context
+
+### Dependencies
+
+- `gh` CLI installed and authenticated
+- Any LLM agent capable of running bash commands
+
+### Sandboxed Execution Rules
+
+The prompt MUST enforce:
+
+- ❌ No inferring PR from conversation history
+- ❌ No looking at git branches, recent commits, or local state
+- ❌ No guessing or assuming PR numbers
+- ✅ Use ONLY explicit PR number/URL from user message
+- ✅ If missing, STOP and ask: "What PR number or URL should I review?"
+
+### Severity Scale
+
+| Level       | Meaning                                                 |
+| ----------- | ------------------------------------------------------- |
+| 🔴 Critical | Security issue, data loss risk, or broken functionality |
+| 🟡 Moderate | Bug, performance issue, or significant code smell       |
+| 🟢 Minor    | Style, naming, minor improvement opportunity            |
+
+### Confidence Scale
+
+| Level           | Meaning                              |
+| --------------- | ------------------------------------ |
+| High (>80%)     | Definitely an issue                  |
+| Medium (40-80%) | Likely an issue, might need context  |
+| Low (<40%)      | Possible issue, could be intentional |
+
+### Example Output Format
+
+```markdown
+## PR Review: #1234
+
+### 1. Unbounded query in user search
+
+**Severity:** 🔴 Critical | **Confidence:** High (>80%)
+
+The search endpoint at `src/api/search.ts:47` doesn't limit results, which could return thousands of rows and cause memory issues.
+
+**Suggestion:** Add `.limit(100)` or implement pagination.
+
+### 2. Missing null check in callback
+
+**Severity:** 🟡 Moderate | **Confidence:** Medium (40-80%)
+
+The callback at `src/handlers/webhook.ts:23` could be undefined if the event type is unregistered.
+
+**Suggestion:** Add defensive check: `if (callback) callback(event)`
+
+---
+
+_Review generated by Raven's Verdict - Deep PR Review Tool_
+```
+
+### Notes
+
+- The "cynical asshole" phase is internal only - never posted
+- Tone transform must happen before any external output
+- When in doubt, ask the user - never assume
+- This is a POC - iterate based on real usage
--- a/tools/maintainer/pr-review/README.md
+++ b/tools/maintainer/pr-review/README.md
@ -0,0 +1,19 @@
+# Raven's Verdict - Deep PR Review Tool
+
+Adversarial code review for GitHub PRs. Works with any LLM agent.
+
+## Prerequisites
+
+- `gh` CLI installed and authenticated (`gh auth status`)
+- Any LLM agent capable of running bash commands
+
+## Usage
+
+```bash
+# Claude Code
+claude "Review PR #123 using tools/maintainer/pr-review/review-prompt.md"
+
+# Other agents: copy review-prompt.md contents to your agent
+```
+
+See `review-prompt.md` for full details on output format, safety features, and how it works.
--- a/tools/maintainer/pr-review/review-prompt.md
+++ b/tools/maintainer/pr-review/review-prompt.md
@ -0,0 +1,237 @@
+# Raven's Verdict - Deep PR Review Tool
+
+A cynical adversarial review, transformed into cold engineering professionalism.
+
+<orientation>
+CRITICAL: Sandboxed Execution Rules
+
+Before proceeding, you MUST verify:
+
+- [ ] PR number or URL was EXPLICITLY provided in the user's message
+- [ ] You are NOT inferring the PR from conversation history
+- [ ] You are NOT looking at git branches, recent commits, or local state
+- [ ] You are NOT guessing or assuming any PR numbers
+
+**If no explicit PR number/URL was provided, STOP immediately and ask:**
+"What PR number or URL should I review?"
+</orientation>
+
+<preflight-checks>
+
+## Preflight Checks
+
+### 0.0 Ensure Clean Checkout
+
+Before anything else, verify the working tree is clean and check out the PR branch.
+
+```bash
+# Check for uncommitted changes
+git status --porcelain
+```
+
+If output is non-empty, STOP and tell user:
+
+> "You have uncommitted changes. Please commit or stash them before running a PR review."
+
+If clean, fetch and checkout the PR branch:
+
+```bash
+# Fetch and checkout PR branch (gh handles the remote fetch)
+gh pr checkout {PR_NUMBER}
+```
+
+If checkout fails, STOP and report the error.
+
+Now you're on the PR branch with full access to all files as they exist in the PR.
+
+### 0.1 Parse PR Input
+
+Extract PR number from user input. Examples of valid formats:
+
+- `123` (just the number)
+- `#123` (with hash)
+- `https://github.com/owner/repo/pull/123` (full URL)
+
+If a URL specifies a different repository than the current one:
+
+```bash
+# Check current repo
+gh repo view --json nameWithOwner -q '.nameWithOwner'
+```
+
+If mismatch detected, ask user:
+
+> "This PR is from `{detected_repo}` but we're in `{current_repo}`. Proceed with reviewing `{detected_repo}#123`? (y/n)"
+
+### 0.2 Check PR Size
+
+```bash
+gh pr view {PR_NUMBER} --json additions,deletions,changedFiles -q '{"additions": .additions, "deletions": .deletions, "files": .changedFiles}'
+```
+
+**Size thresholds:**
+
+| Metric        | Warning Threshold |
+| ------------- | ----------------- |
+| Files changed | > 50              |
+| Lines changed | > 5000            |
+
+If thresholds exceeded, ask user:
+
+> "This PR has {X} files and {Y} line changes. That's large.
+>
+> **[f] Focus** - Pick specific files or directories to review
+> **[p] Proceed** - Review everything (may be slow/expensive)
+> **[a] Abort** - Stop here"
+
+### 0.3 Note Binary Files
+
+```bash
+gh pr diff {PR_NUMBER} --name-only | grep -E '\.(png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot|pdf|zip|tar|gz|bin|exe|dll|so|dylib)$' || echo "No binary files detected"
+```
+
+Store list of binary files to skip. Note them in final output.
+
+</preflight-checks>
+
+<adversarial-review>
+
+### 1.1 Run Cynical Review
+
+**INTERNAL PERSONA - Never post this directly:**
+
+Task: You are a cynical, jaded code reviewer with zero patience for sloppy work. This PR was submitted by a clueless weasel and you expect to find problems. Find at least five issues to fix or improve in it. Number them. Be skeptical of everything. Ultrathink.
+
+Output format:
+
+```markdown
+### [NUMBER]. [FINDING TITLE] [likely]
+
+**Severity:** [EMOJI] [LEVEL]
+
+[DESCRIPTION - be specific, include file:line references]
+```
+
+Severity scale:
+
+| Level    | Emoji | Meaning                                                 |
+| -------- | ----- | ------------------------------------------------------- |
+| Critical | 🔴    | Security issue, data loss risk, or broken functionality |
+| Moderate | 🟡    | Bug, performance issue, or significant code smell       |
+| Minor    | 🟢    | Style, naming, minor improvement opportunity            |
+
+Likely tag:
+
+- Add `[likely]` to findings with high confidence, e.g. with direct evidence
+- Sort findings by severity (Critical → Moderate → Minor), not by confidence
+
+</adversarial-review>
+
+<tone-transformation>
+
+**Transform the cynical output into cold engineering professionalism.**
+
+**Transformation rules:**
+
+1. Remove all inflammatory language, insults, assumptions about the author
+2. Keep all technical substance, file references, severity ratings and likely tag
+3. Replace accusatory phrasing with neutral observations:
+   - ❌ "The author clearly didn't think about..."
+   - ✅ "This implementation may not account for..."
+4. Preserve skepticism as healthy engineering caution:
+   - ❌ "This will definitely break in production"
+   - ✅ "This pattern has historically caused issues in production environments"
+5. Add the suggested fixes.
+6. Keep suggestions actionable and specific
+
+Output format after transformation:
+
+```markdown
+## PR Review: #{PR_NUMBER}
+
+**Title:** {PR_TITLE}
+**Author:** @{AUTHOR}
+**Branch:** {HEAD} → {BASE}
+
+---
+
+### Findings
+
+[TRANSFORMED FINDINGS HERE]
+
+---
+
+### Summary
+
+**Critical:** {COUNT} | **Moderate:** {COUNT} | **Minor:** {COUNT}
+
+[BINARY_FILES_NOTE if any]
+
+---
+
+_Review generated by Raven's Verdict. LLM-produced analysis - findings may be incorrect or lack context. Verify before acting._
+```
+
+</tone-transformation>
+
+<post-review>
+### 3.1 Preview
+
+Display the complete transformed review to the user.
+
+```
+══════════════════════════════════════════════════════
+PREVIEW - This will be posted to PR #{PR_NUMBER}
+══════════════════════════════════════════════════════
+
+[FULL REVIEW CONTENT]
+
+══════════════════════════════════════════════════════
+```
+
+### 3.2 Confirm
+
+Ask user for explicit confirmation:
+
+> **Ready to post this review to PR #{PR_NUMBER}?**
+>
+> **[y] Yes** - Post as comment
+> **[n] No** - Abort, do not post
+> **[e] Edit** - Let me modify before posting
+> **[s] Save only** - Save locally, don't post
+
+### 3.3 Post or Save
+
+**Write review to a temp file, then post:**
+
+1. Write the review content to a temp file with a unique name (include PR number to avoid collisions)
+2. Post using `gh pr comment {PR_NUMBER} --body-file {path}`
+3. Delete the temp file after successful post
+
+Do NOT use heredocs or `echo` - markdown code blocks will break shell parsing. Use your file writing tool instead.
+
+**If auth fails or post fails:**
+
+1. Display error prominently:
+
+   ```
+   ⚠️  FAILED TO POST REVIEW
+   Error: {ERROR_MESSAGE}
+   ```
+
+2. Keep the temp file and tell the user where it is, so they can post manually with:
+   `gh pr comment {PR_NUMBER} --body-file {path}`
+
+**If save only (s):**
+
+Keep the temp file and inform user of location.
+
+</post-review>
+
+<notes>
+- The "cynical asshole" phase is internal only - never posted
+- Tone transform MUST happen before any external output
+- When in doubt, ask the user - never assume
+- If you're unsure about severity, err toward higher severity
+- If you're unsure about confidence, be honest and use Medium or Low
+</notes>