6.9 KiB
6.9 KiB
Tech-Spec: PR Review Tool (Raven's Verdict)
Created: 2025-12-06 Status: Completed
Overview
Problem Statement
External contributors submit PRs to the upstream repository without context on code quality expectations. Maintainers need a way to provide deep, thorough code review feedback without spending hours manually reviewing every PR. Automated tools like CodeRabbit handle surface-level checks, but high-compute, human-triggered deep reviews are missing.
Solution
A portable prompt file that any LLM agent can execute to:
- Fetch PR diff and full files via
ghCLI - Run an adversarial code review (cynical, thorough)
- Transform the tone from "cynical asshole" to "cold engineering professional"
- Post the findings as a comment on the PR
Scope
In Scope:
- Manual trigger via any LLM agent (Claude Code, Cursor, Windsurf, etc.)
- Review of GitHub PRs using
ghCLI - Adversarial review with severity + confidence ratings
- Tone transformation before posting
- Preview and explicit confirmation before posting
Out of Scope:
- Automated triggers (webhooks, GitHub Actions)
- Integration with CodeRabbit or other tools
- Review of non-GitHub repositories
- Persistent storage or history tracking
Context for Development
Codebase Patterns
- Maintainer tools live in
tools/maintainer/ - Prompts are simple markdown files with clear instructions
- Existing pattern:
review-adversarial.md(3 lines, direct, effective)
Files to Reference
tools/maintainer/fix-elicitation-wording.md- Example of agent prompt in maintainer tools.claude/commands/review-adversarial.md- Base cynical reviewer prompt to adapt
Technical Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Location | tools/maintainer/pr-review/ |
Maintainer tooling, separate from product |
| Invocation | Prompt file + PR URL/number | Portable across all LLM platforms |
| PR data source | gh CLI |
Already available, handles auth |
| Review input | Diff + full files | Diff for focus, full files for tangents |
| Tone transform | Same session, Phase 2 with task: prefix |
Spawns sub-agent if available, inline otherwise |
| Output format | Numbered, freeform, severity + confidence | Scannable, actionable |
Implementation Plan
Tasks
- Task 1: Create
tools/maintainer/pr-review/directory structure - Task 2: Write
review-prompt.md- the main prompt file with all phases - Task 3: Write
README.md- usage instructions for maintainers - Task 4: Test with a real PR on the upstream repo
- Task 5: Iterate based on output quality
File Structure
tools/maintainer/pr-review/
├── README.md # How to use
├── review-prompt.md # The main prompt file
└── output/ # Local backup folder (gitignored)
Prompt File Structure (review-prompt.md)
## Phase 0: Pre-flight Checks
- Verify PR number/URL provided (if not, STOP and ask)
- Check PR size via gh pr view --json
- Confirm repo if different from upstream
- Note binary files to skip
## Phase 1: Adversarial Review
- Fetch diff + full files
- Run cynical review
- Output numbered findings with severity + confidence
## Phase 2: Tone Transform
- task: Rewrite findings as cold engineering professional
- Preserve severity/confidence markers
- Remove inflammatory language, keep substance
## Phase 3: Post
- Preview full comment
- Ask for explicit confirmation
- Post via gh pr comment
- Handle auth failure gracefully
Acceptance Criteria
- AC 1: Given a PR URL, when agent reads prompt, then it fetches PR data via
ghwithout hallucinating PR numbers - AC 2: Given PR data, when review runs, then findings are numbered with severity (🔴🟡🟢) and confidence (High/Medium/Low %)
- AC 3: Given cynical output, when tone transform runs, then language is professional but findings retain substance
- AC 4: Given transformed output, when user confirms, then comment posts to PR via
gh pr comment - AC 5: Given missing PR number, when agent starts, then it stops and asks user explicitly
- AC 6: Given PR from different repo, when agent detects mismatch, then it asks user to confirm before proceeding
- AC 7: Given PR with >50 files or >5000 lines, when pre-flight runs, then agent warns and asks to proceed or focus
- AC 8: Given auth failure during post, when error occurs, then review is saved locally and error is displayed loudly
- AC 9: Given PR with binary files, when fetching diff, then binaries are skipped with a note
Additional Context
Dependencies
ghCLI installed and authenticated- Any LLM agent capable of running bash commands
Sandboxed Execution Rules
The prompt MUST enforce:
- ❌ No inferring PR from conversation history
- ❌ No looking at git branches, recent commits, or local state
- ❌ No guessing or assuming PR numbers
- ✅ Use ONLY explicit PR number/URL from user message
- ✅ If missing, STOP and ask: "What PR number or URL should I review?"
Severity Scale
| Level | Meaning |
|---|---|
| 🔴 Critical | Security issue, data loss risk, or broken functionality |
| 🟡 Moderate | Bug, performance issue, or significant code smell |
| 🟢 Minor | Style, naming, minor improvement opportunity |
Confidence Scale
| Level | Meaning |
|---|---|
| High (>80%) | Definitely an issue |
| Medium (40-80%) | Likely an issue, might need context |
| Low (<40%) | Possible issue, could be intentional |
Example Output Format
## PR Review: #1234
### 1. Unbounded query in user search
**Severity:** 🔴 Critical | **Confidence:** High (>80%)
The search endpoint at `src/api/search.ts:47` doesn't limit results, which could return thousands of rows and cause memory issues.
**Suggestion:** Add `.limit(100)` or implement pagination.
### 2. Missing null check in callback
**Severity:** 🟡 Moderate | **Confidence:** Medium (40-80%)
The callback at `src/handlers/webhook.ts:23` could be undefined if the event type is unregistered.
**Suggestion:** Add defensive check: `if (callback) callback(event)`
---
_Review generated by Raven's Verdict - Deep PR Review Tool_
Notes
- The "cynical asshole" phase is internal only - never posted
- Tone transform must happen before any external output
- When in doubt, ask the user - never assume
- This is a POC - iterate based on real usage