6.9 KiB

Raw Blame History

Tech-Spec: PR Review Tool (Raven's Verdict)

Created: 2025-12-06 Status: Completed

Overview

Problem Statement

External contributors submit PRs to the upstream repository without context on code quality expectations. Maintainers need a way to provide deep, thorough code review feedback without spending hours manually reviewing every PR. Automated tools like CodeRabbit handle surface-level checks, but high-compute, human-triggered deep reviews are missing.

Solution

A portable prompt file that any LLM agent can execute to:

Fetch PR diff and full files via gh CLI
Run an adversarial code review (cynical, thorough)
Transform the tone from "cynical asshole" to "cold engineering professional"
Post the findings as a comment on the PR

Scope

In Scope:

Manual trigger via any LLM agent (Claude Code, Cursor, Windsurf, etc.)
Review of GitHub PRs using gh CLI
Adversarial review with severity + confidence ratings
Tone transformation before posting
Preview and explicit confirmation before posting

Out of Scope:

Automated triggers (webhooks, GitHub Actions)
Integration with CodeRabbit or other tools
Review of non-GitHub repositories
Persistent storage or history tracking

Context for Development

Codebase Patterns

Maintainer tools live in tools/maintainer/
Prompts are simple markdown files with clear instructions
Existing pattern: review-adversarial.md (3 lines, direct, effective)

Files to Reference

tools/maintainer/fix-elicitation-wording.md - Example of agent prompt in maintainer tools
.claude/commands/review-adversarial.md - Base cynical reviewer prompt to adapt

Technical Decisions

Decision	Choice	Rationale
Location	`tools/maintainer/pr-review/`	Maintainer tooling, separate from product
Invocation	Prompt file + PR URL/number	Portable across all LLM platforms
PR data source	`gh` CLI	Already available, handles auth
Review input	Diff + full files	Diff for focus, full files for tangents
Tone transform	Same session, Phase 2 with `task:` prefix	Spawns sub-agent if available, inline otherwise
Output format	Numbered, freeform, severity + confidence	Scannable, actionable

Implementation Plan

Tasks

Task 1: Create tools/maintainer/pr-review/ directory structure
Task 2: Write review-prompt.md - the main prompt file with all phases
Task 3: Write README.md - usage instructions for maintainers
Task 4: Test with a real PR on the upstream repo
Task 5: Iterate based on output quality

File Structure

tools/maintainer/pr-review/
  ├── README.md           # How to use
  ├── review-prompt.md    # The main prompt file
  └── output/             # Local backup folder (gitignored)

Prompt File Structure (`review-prompt.md`)

## Phase 0: Pre-flight Checks
- Verify PR number/URL provided (if not, STOP and ask)
- Check PR size via gh pr view --json
- Confirm repo if different from upstream
- Note binary files to skip

## Phase 1: Adversarial Review
- Fetch diff + full files
- Run cynical review
- Output numbered findings with severity + confidence

## Phase 2: Tone Transform
- task: Rewrite findings as cold engineering professional
- Preserve severity/confidence markers
- Remove inflammatory language, keep substance

## Phase 3: Post
- Preview full comment
- Ask for explicit confirmation
- Post via gh pr comment
- Handle auth failure gracefully

Acceptance Criteria

AC 1: Given a PR URL, when agent reads prompt, then it fetches PR data via gh without hallucinating PR numbers
AC 2: Given PR data, when review runs, then findings are numbered with severity (🔴🟡🟢) and confidence (High/Medium/Low %)
AC 3: Given cynical output, when tone transform runs, then language is professional but findings retain substance
AC 4: Given transformed output, when user confirms, then comment posts to PR via gh pr comment
AC 5: Given missing PR number, when agent starts, then it stops and asks user explicitly
AC 6: Given PR from different repo, when agent detects mismatch, then it asks user to confirm before proceeding
AC 7: Given PR with >50 files or >5000 lines, when pre-flight runs, then agent warns and asks to proceed or focus
AC 8: Given auth failure during post, when error occurs, then review is saved locally and error is displayed loudly
AC 9: Given PR with binary files, when fetching diff, then binaries are skipped with a note

Additional Context

Dependencies

gh CLI installed and authenticated
Any LLM agent capable of running bash commands

Sandboxed Execution Rules

The prompt MUST enforce:

❌ No inferring PR from conversation history
❌ No looking at git branches, recent commits, or local state
❌ No guessing or assuming PR numbers
✅ Use ONLY explicit PR number/URL from user message
✅ If missing, STOP and ask: "What PR number or URL should I review?"

Severity Scale

Level	Meaning
🔴 Critical	Security issue, data loss risk, or broken functionality
🟡 Moderate	Bug, performance issue, or significant code smell
🟢 Minor	Style, naming, minor improvement opportunity

Confidence Scale

Level	Meaning
High (>80%)	Definitely an issue
Medium (40-80%)	Likely an issue, might need context
Low (<40%)	Possible issue, could be intentional

Example Output Format

## PR Review: #1234

### 1. Unbounded query in user search

**Severity:** 🔴 Critical | **Confidence:** High (>80%)

The search endpoint at `src/api/search.ts:47` doesn't limit results, which could return thousands of rows and cause memory issues.

**Suggestion:** Add `.limit(100)` or implement pagination.

### 2. Missing null check in callback

**Severity:** 🟡 Moderate | **Confidence:** Medium (40-80%)

The callback at `src/handlers/webhook.ts:23` could be undefined if the event type is unregistered.

**Suggestion:** Add defensive check: `if (callback) callback(event)`

---

_Review generated by Raven's Verdict - Deep PR Review Tool_

Notes

The "cynical asshole" phase is internal only - never posted
Tone transform must happen before any external output
When in doubt, ask the user - never assume
This is a POC - iterate based on real usage

6.9 KiB Raw Blame History