feat: add Raven's Verdict PR review tool
Adversarial code review for GitHub PRs with: - Cynical internal review phase, professional external output - Severity-based findings with [likely] tag for high-confidence issues - LLM disclaimer in output footer - Safety features: sandboxed execution, large PR warnings, preview before post
This commit is contained in:
parent
80a90c01d4
commit
70e597aa83
|
|
@ -54,6 +54,9 @@ flattened-codebase.xml
|
|||
#UAT template testing output files
|
||||
tools/template-test-generator/test-scenarios/
|
||||
|
||||
# Maintainer tools output
|
||||
tools/maintainer/pr-review/output/
|
||||
|
||||
# Bundler temporary files and generated bundles
|
||||
.bundler-temp/
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,183 @@
|
|||
# Tech-Spec: PR Review Tool (Raven's Verdict)
|
||||
|
||||
**Created:** 2025-12-06
|
||||
**Status:** Completed
|
||||
|
||||
## Overview
|
||||
|
||||
### Problem Statement
|
||||
|
||||
External contributors submit PRs to the upstream repository without context on code quality expectations. Maintainers need a way to provide deep, thorough code review feedback without spending hours manually reviewing every PR. Automated tools like CodeRabbit handle surface-level checks, but high-compute, human-triggered deep reviews are missing.
|
||||
|
||||
### Solution
|
||||
|
||||
A portable prompt file that any LLM agent can execute to:
|
||||
|
||||
1. Fetch PR diff and full files via `gh` CLI
|
||||
2. Run an adversarial code review (cynical, thorough)
|
||||
3. Transform the tone from "cynical asshole" to "cold engineering professional"
|
||||
4. Post the findings as a comment on the PR
|
||||
|
||||
### Scope
|
||||
|
||||
**In Scope:**
|
||||
|
||||
- Manual trigger via any LLM agent (Claude Code, Cursor, Windsurf, etc.)
|
||||
- Review of GitHub PRs using `gh` CLI
|
||||
- Adversarial review with severity + confidence ratings
|
||||
- Tone transformation before posting
|
||||
- Preview and explicit confirmation before posting
|
||||
|
||||
**Out of Scope:**
|
||||
|
||||
- Automated triggers (webhooks, GitHub Actions)
|
||||
- Integration with CodeRabbit or other tools
|
||||
- Review of non-GitHub repositories
|
||||
- Persistent storage or history tracking
|
||||
|
||||
## Context for Development
|
||||
|
||||
### Codebase Patterns
|
||||
|
||||
- Maintainer tools live in `tools/maintainer/`
|
||||
- Prompts are simple markdown files with clear instructions
|
||||
- Existing pattern: `review-adversarial.md` (3 lines, direct, effective)
|
||||
|
||||
### Files to Reference
|
||||
|
||||
- `tools/maintainer/fix-elicitation-wording.md` - Example of agent prompt in maintainer tools
|
||||
- `.claude/commands/review-adversarial.md` - Base cynical reviewer prompt to adapt
|
||||
|
||||
### Technical Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
| -------------- | ----------------------------------------- | ----------------------------------------------- |
|
||||
| Location | `tools/maintainer/pr-review/` | Maintainer tooling, separate from product |
|
||||
| Invocation | Prompt file + PR URL/number | Portable across all LLM platforms |
|
||||
| PR data source | `gh` CLI | Already available, handles auth |
|
||||
| Review input | Diff + full files | Diff for focus, full files for tangents |
|
||||
| Tone transform | Same session, Phase 2 with `task:` prefix | Spawns sub-agent if available, inline otherwise |
|
||||
| Output format | Numbered, freeform, severity + confidence | Scannable, actionable |
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Tasks
|
||||
|
||||
- [x] Task 1: Create `tools/maintainer/pr-review/` directory structure
|
||||
- [x] Task 2: Write `review-prompt.md` - the main prompt file with all phases
|
||||
- [x] Task 3: Write `README.md` - usage instructions for maintainers
|
||||
- [ ] Task 4: Test with a real PR on the upstream repo
|
||||
- [ ] Task 5: Iterate based on output quality
|
||||
|
||||
### File Structure
|
||||
|
||||
```
|
||||
tools/maintainer/pr-review/
|
||||
├── README.md # How to use
|
||||
├── review-prompt.md # The main prompt file
|
||||
└── output/ # Local backup folder (gitignored)
|
||||
```
|
||||
|
||||
### Prompt File Structure (`review-prompt.md`)
|
||||
|
||||
```
|
||||
## Phase 0: Pre-flight Checks
|
||||
- Verify PR number/URL provided (if not, STOP and ask)
|
||||
- Check PR size via gh pr view --json
|
||||
- Confirm repo if different from upstream
|
||||
- Note binary files to skip
|
||||
|
||||
## Phase 1: Adversarial Review
|
||||
- Fetch diff + full files
|
||||
- Run cynical review
|
||||
- Output numbered findings with severity + confidence
|
||||
|
||||
## Phase 2: Tone Transform
|
||||
- task: Rewrite findings as cold engineering professional
|
||||
- Preserve severity/confidence markers
|
||||
- Remove inflammatory language, keep substance
|
||||
|
||||
## Phase 3: Post
|
||||
- Preview full comment
|
||||
- Ask for explicit confirmation
|
||||
- Post via gh pr comment
|
||||
- Handle auth failure gracefully
|
||||
```
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- [ ] AC 1: Given a PR URL, when agent reads prompt, then it fetches PR data via `gh` without hallucinating PR numbers
|
||||
- [ ] AC 2: Given PR data, when review runs, then findings are numbered with severity (🔴🟡🟢) and confidence (High/Medium/Low %)
|
||||
- [ ] AC 3: Given cynical output, when tone transform runs, then language is professional but findings retain substance
|
||||
- [ ] AC 4: Given transformed output, when user confirms, then comment posts to PR via `gh pr comment`
|
||||
- [ ] AC 5: Given missing PR number, when agent starts, then it stops and asks user explicitly
|
||||
- [ ] AC 6: Given PR from different repo, when agent detects mismatch, then it asks user to confirm before proceeding
|
||||
- [ ] AC 7: Given PR with >50 files or >5000 lines, when pre-flight runs, then agent warns and asks to proceed or focus
|
||||
- [ ] AC 8: Given auth failure during post, when error occurs, then review is saved locally and error is displayed loudly
|
||||
- [ ] AC 9: Given PR with binary files, when fetching diff, then binaries are skipped with a note
|
||||
|
||||
## Additional Context
|
||||
|
||||
### Dependencies
|
||||
|
||||
- `gh` CLI installed and authenticated
|
||||
- Any LLM agent capable of running bash commands
|
||||
|
||||
### Sandboxed Execution Rules
|
||||
|
||||
The prompt MUST enforce:
|
||||
|
||||
- ❌ No inferring PR from conversation history
|
||||
- ❌ No looking at git branches, recent commits, or local state
|
||||
- ❌ No guessing or assuming PR numbers
|
||||
- ✅ Use ONLY explicit PR number/URL from user message
|
||||
- ✅ If missing, STOP and ask: "What PR number or URL should I review?"
|
||||
|
||||
### Severity Scale
|
||||
|
||||
| Level | Meaning |
|
||||
| ----------- | ------------------------------------------------------- |
|
||||
| 🔴 Critical | Security issue, data loss risk, or broken functionality |
|
||||
| 🟡 Moderate | Bug, performance issue, or significant code smell |
|
||||
| 🟢 Minor | Style, naming, minor improvement opportunity |
|
||||
|
||||
### Confidence Scale
|
||||
|
||||
| Level | Meaning |
|
||||
| --------------- | ------------------------------------ |
|
||||
| High (>80%) | Definitely an issue |
|
||||
| Medium (40-80%) | Likely an issue, might need context |
|
||||
| Low (<40%) | Possible issue, could be intentional |
|
||||
|
||||
### Example Output Format
|
||||
|
||||
```markdown
|
||||
## PR Review: #1234
|
||||
|
||||
### 1. Unbounded query in user search
|
||||
|
||||
**Severity:** 🔴 Critical | **Confidence:** High (>80%)
|
||||
|
||||
The search endpoint at `src/api/search.ts:47` doesn't limit results, which could return thousands of rows and cause memory issues.
|
||||
|
||||
**Suggestion:** Add `.limit(100)` or implement pagination.
|
||||
|
||||
### 2. Missing null check in callback
|
||||
|
||||
**Severity:** 🟡 Moderate | **Confidence:** Medium (40-80%)
|
||||
|
||||
The callback at `src/handlers/webhook.ts:23` could be undefined if the event type is unregistered.
|
||||
|
||||
**Suggestion:** Add defensive check: `if (callback) callback(event)`
|
||||
|
||||
---
|
||||
|
||||
_Review generated by Raven's Verdict - Deep PR Review Tool_
|
||||
```
|
||||
|
||||
### Notes
|
||||
|
||||
- The "cynical asshole" phase is internal only - never posted
|
||||
- Tone transform must happen before any external output
|
||||
- When in doubt, ask the user - never assume
|
||||
- This is a POC - iterate based on real usage
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
# Raven's Verdict - Deep PR Review Tool
|
||||
|
||||
Adversarial code review for GitHub PRs. Works with any LLM agent.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- `gh` CLI installed and authenticated (`gh auth status`)
|
||||
- Any LLM agent capable of running bash commands
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Claude Code
|
||||
claude "Review PR #123 using tools/maintainer/pr-review/review-prompt.md"
|
||||
|
||||
# Other agents: copy review-prompt.md contents to your agent
|
||||
```
|
||||
|
||||
See `review-prompt.md` for full details on output format, safety features, and how it works.
|
||||
|
|
@ -0,0 +1,237 @@
|
|||
# Raven's Verdict - Deep PR Review Tool
|
||||
|
||||
A cynical adversarial review, transformed into cold engineering professionalism.
|
||||
|
||||
<orientation>
|
||||
CRITICAL: Sandboxed Execution Rules
|
||||
|
||||
Before proceeding, you MUST verify:
|
||||
|
||||
- [ ] PR number or URL was EXPLICITLY provided in the user's message
|
||||
- [ ] You are NOT inferring the PR from conversation history
|
||||
- [ ] You are NOT looking at git branches, recent commits, or local state
|
||||
- [ ] You are NOT guessing or assuming any PR numbers
|
||||
|
||||
**If no explicit PR number/URL was provided, STOP immediately and ask:**
|
||||
"What PR number or URL should I review?"
|
||||
</orientation>
|
||||
|
||||
<preflight-checks>
|
||||
|
||||
## Preflight Checks
|
||||
|
||||
### 0.0 Ensure Clean Checkout
|
||||
|
||||
Before anything else, verify the working tree is clean and check out the PR branch.
|
||||
|
||||
```bash
|
||||
# Check for uncommitted changes
|
||||
git status --porcelain
|
||||
```
|
||||
|
||||
If output is non-empty, STOP and tell user:
|
||||
|
||||
> "You have uncommitted changes. Please commit or stash them before running a PR review."
|
||||
|
||||
If clean, fetch and checkout the PR branch:
|
||||
|
||||
```bash
|
||||
# Fetch and checkout PR branch (gh handles the remote fetch)
|
||||
gh pr checkout {PR_NUMBER}
|
||||
```
|
||||
|
||||
If checkout fails, STOP and report the error.
|
||||
|
||||
Now you're on the PR branch with full access to all files as they exist in the PR.
|
||||
|
||||
### 0.1 Parse PR Input
|
||||
|
||||
Extract PR number from user input. Examples of valid formats:
|
||||
|
||||
- `123` (just the number)
|
||||
- `#123` (with hash)
|
||||
- `https://github.com/owner/repo/pull/123` (full URL)
|
||||
|
||||
If a URL specifies a different repository than the current one:
|
||||
|
||||
```bash
|
||||
# Check current repo
|
||||
gh repo view --json nameWithOwner -q '.nameWithOwner'
|
||||
```
|
||||
|
||||
If mismatch detected, ask user:
|
||||
|
||||
> "This PR is from `{detected_repo}` but we're in `{current_repo}`. Proceed with reviewing `{detected_repo}#123`? (y/n)"
|
||||
|
||||
### 0.2 Check PR Size
|
||||
|
||||
```bash
|
||||
gh pr view {PR_NUMBER} --json additions,deletions,changedFiles -q '{"additions": .additions, "deletions": .deletions, "files": .changedFiles}'
|
||||
```
|
||||
|
||||
**Size thresholds:**
|
||||
|
||||
| Metric | Warning Threshold |
|
||||
| ------------- | ----------------- |
|
||||
| Files changed | > 50 |
|
||||
| Lines changed | > 5000 |
|
||||
|
||||
If thresholds exceeded, ask user:
|
||||
|
||||
> "This PR has {X} files and {Y} line changes. That's large.
|
||||
>
|
||||
> **[f] Focus** - Pick specific files or directories to review
|
||||
> **[p] Proceed** - Review everything (may be slow/expensive)
|
||||
> **[a] Abort** - Stop here"
|
||||
|
||||
### 0.3 Note Binary Files
|
||||
|
||||
```bash
|
||||
gh pr diff {PR_NUMBER} --name-only | grep -E '\.(png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot|pdf|zip|tar|gz|bin|exe|dll|so|dylib)$' || echo "No binary files detected"
|
||||
```
|
||||
|
||||
Store list of binary files to skip. Note them in final output.
|
||||
|
||||
</preflight-checks>
|
||||
|
||||
<adversarial-review>
|
||||
|
||||
### 1.1 Run Cynical Review
|
||||
|
||||
**INTERNAL PERSONA - Never post this directly:**
|
||||
|
||||
Task: You are a cynical, jaded code reviewer with zero patience for sloppy work. This PR was submitted by a clueless weasel and you expect to find problems. Find at least five issues to fix or improve in it. Number them. Be skeptical of everything. Ultrathink.
|
||||
|
||||
Output format:
|
||||
|
||||
```markdown
|
||||
### [NUMBER]. [FINDING TITLE] [likely]
|
||||
|
||||
**Severity:** [EMOJI] [LEVEL]
|
||||
|
||||
[DESCRIPTION - be specific, include file:line references]
|
||||
```
|
||||
|
||||
Severity scale:
|
||||
|
||||
| Level | Emoji | Meaning |
|
||||
| -------- | ----- | ------------------------------------------------------- |
|
||||
| Critical | 🔴 | Security issue, data loss risk, or broken functionality |
|
||||
| Moderate | 🟡 | Bug, performance issue, or significant code smell |
|
||||
| Minor | 🟢 | Style, naming, minor improvement opportunity |
|
||||
|
||||
Likely tag:
|
||||
|
||||
- Add `[likely]` to findings with high confidence, e.g. with direct evidence
|
||||
- Sort findings by severity (Critical → Moderate → Minor), not by confidence
|
||||
|
||||
</adversarial-review>
|
||||
|
||||
<tone-transformation>
|
||||
|
||||
**Transform the cynical output into cold engineering professionalism.**
|
||||
|
||||
**Transformation rules:**
|
||||
|
||||
1. Remove all inflammatory language, insults, assumptions about the author
|
||||
2. Keep all technical substance, file references, severity ratings and likely tag
|
||||
3. Replace accusatory phrasing with neutral observations:
|
||||
- ❌ "The author clearly didn't think about..."
|
||||
- ✅ "This implementation may not account for..."
|
||||
4. Preserve skepticism as healthy engineering caution:
|
||||
- ❌ "This will definitely break in production"
|
||||
- ✅ "This pattern has historically caused issues in production environments"
|
||||
5. Add the suggested fixes.
|
||||
6. Keep suggestions actionable and specific
|
||||
|
||||
Output format after transformation:
|
||||
|
||||
```markdown
|
||||
## PR Review: #{PR_NUMBER}
|
||||
|
||||
**Title:** {PR_TITLE}
|
||||
**Author:** @{AUTHOR}
|
||||
**Branch:** {HEAD} → {BASE}
|
||||
|
||||
---
|
||||
|
||||
### Findings
|
||||
|
||||
[TRANSFORMED FINDINGS HERE]
|
||||
|
||||
---
|
||||
|
||||
### Summary
|
||||
|
||||
**Critical:** {COUNT} | **Moderate:** {COUNT} | **Minor:** {COUNT}
|
||||
|
||||
[BINARY_FILES_NOTE if any]
|
||||
|
||||
---
|
||||
|
||||
_Review generated by Raven's Verdict. LLM-produced analysis - findings may be incorrect or lack context. Verify before acting._
|
||||
```
|
||||
|
||||
</tone-transformation>
|
||||
|
||||
<post-review>
|
||||
### 3.1 Preview
|
||||
|
||||
Display the complete transformed review to the user.
|
||||
|
||||
```
|
||||
══════════════════════════════════════════════════════
|
||||
PREVIEW - This will be posted to PR #{PR_NUMBER}
|
||||
══════════════════════════════════════════════════════
|
||||
|
||||
[FULL REVIEW CONTENT]
|
||||
|
||||
══════════════════════════════════════════════════════
|
||||
```
|
||||
|
||||
### 3.2 Confirm
|
||||
|
||||
Ask user for explicit confirmation:
|
||||
|
||||
> **Ready to post this review to PR #{PR_NUMBER}?**
|
||||
>
|
||||
> **[y] Yes** - Post as comment
|
||||
> **[n] No** - Abort, do not post
|
||||
> **[e] Edit** - Let me modify before posting
|
||||
> **[s] Save only** - Save locally, don't post
|
||||
|
||||
### 3.3 Post or Save
|
||||
|
||||
**Write review to a temp file, then post:**
|
||||
|
||||
1. Write the review content to a temp file with a unique name (include PR number to avoid collisions)
|
||||
2. Post using `gh pr comment {PR_NUMBER} --body-file {path}`
|
||||
3. Delete the temp file after successful post
|
||||
|
||||
Do NOT use heredocs or `echo` - markdown code blocks will break shell parsing. Use your file writing tool instead.
|
||||
|
||||
**If auth fails or post fails:**
|
||||
|
||||
1. Display error prominently:
|
||||
|
||||
```
|
||||
⚠️ FAILED TO POST REVIEW
|
||||
Error: {ERROR_MESSAGE}
|
||||
```
|
||||
|
||||
2. Keep the temp file and tell the user where it is, so they can post manually with:
|
||||
`gh pr comment {PR_NUMBER} --body-file {path}`
|
||||
|
||||
**If save only (s):**
|
||||
|
||||
Keep the temp file and inform user of location.
|
||||
|
||||
</post-review>
|
||||
|
||||
<notes>
|
||||
- The "cynical asshole" phase is internal only - never posted
|
||||
- Tone transform MUST happen before any external output
|
||||
- When in doubt, ask the user - never assume
|
||||
- If you're unsure about severity, err toward higher severity
|
||||
- If you're unsure about confidence, be honest and use Medium or Low
|
||||
</notes>
|
||||
Loading…
Reference in New Issue