diff --git a/.gitignore b/.gitignore index 8a9137a1..30404358 100644 --- a/.gitignore +++ b/.gitignore @@ -54,6 +54,9 @@ flattened-codebase.xml #UAT template testing output files tools/template-test-generator/test-scenarios/ +# Maintainer tools output +tools/maintainer/pr-review/output/ + # Bundler temporary files and generated bundles .bundler-temp/ diff --git a/docs/sprint-artifacts/tech-spec-pr-review-tool.md b/docs/sprint-artifacts/tech-spec-pr-review-tool.md new file mode 100644 index 00000000..4f3470ec --- /dev/null +++ b/docs/sprint-artifacts/tech-spec-pr-review-tool.md @@ -0,0 +1,183 @@ +# Tech-Spec: PR Review Tool (Raven's Verdict) + +**Created:** 2025-12-06 +**Status:** Completed + +## Overview + +### Problem Statement + +External contributors submit PRs to the upstream repository without context on code quality expectations. Maintainers need a way to provide deep, thorough code review feedback without spending hours manually reviewing every PR. Automated tools like CodeRabbit handle surface-level checks, but high-compute, human-triggered deep reviews are missing. + +### Solution + +A portable prompt file that any LLM agent can execute to: + +1. Fetch PR diff and full files via `gh` CLI +2. Run an adversarial code review (cynical, thorough) +3. Transform the tone from "cynical asshole" to "cold engineering professional" +4. Post the findings as a comment on the PR + +### Scope + +**In Scope:** + +- Manual trigger via any LLM agent (Claude Code, Cursor, Windsurf, etc.) +- Review of GitHub PRs using `gh` CLI +- Adversarial review with severity + confidence ratings +- Tone transformation before posting +- Preview and explicit confirmation before posting + +**Out of Scope:** + +- Automated triggers (webhooks, GitHub Actions) +- Integration with CodeRabbit or other tools +- Review of non-GitHub repositories +- Persistent storage or history tracking + +## Context for Development + +### Codebase Patterns + +- Maintainer tools live in `tools/maintainer/` +- Prompts are simple markdown files with clear instructions +- Existing pattern: `review-adversarial.md` (3 lines, direct, effective) + +### Files to Reference + +- `tools/maintainer/fix-elicitation-wording.md` - Example of agent prompt in maintainer tools +- `.claude/commands/review-adversarial.md` - Base cynical reviewer prompt to adapt + +### Technical Decisions + +| Decision | Choice | Rationale | +| -------------- | ----------------------------------------- | ----------------------------------------------- | +| Location | `tools/maintainer/pr-review/` | Maintainer tooling, separate from product | +| Invocation | Prompt file + PR URL/number | Portable across all LLM platforms | +| PR data source | `gh` CLI | Already available, handles auth | +| Review input | Diff + full files | Diff for focus, full files for tangents | +| Tone transform | Same session, Phase 2 with `task:` prefix | Spawns sub-agent if available, inline otherwise | +| Output format | Numbered, freeform, severity + confidence | Scannable, actionable | + +## Implementation Plan + +### Tasks + +- [x] Task 1: Create `tools/maintainer/pr-review/` directory structure +- [x] Task 2: Write `review-prompt.md` - the main prompt file with all phases +- [x] Task 3: Write `README.md` - usage instructions for maintainers +- [ ] Task 4: Test with a real PR on the upstream repo +- [ ] Task 5: Iterate based on output quality + +### File Structure + +``` +tools/maintainer/pr-review/ + ├── README.md # How to use + ├── review-prompt.md # The main prompt file + └── output/ # Local backup folder (gitignored) +``` + +### Prompt File Structure (`review-prompt.md`) + +``` +## Phase 0: Pre-flight Checks +- Verify PR number/URL provided (if not, STOP and ask) +- Check PR size via gh pr view --json +- Confirm repo if different from upstream +- Note binary files to skip + +## Phase 1: Adversarial Review +- Fetch diff + full files +- Run cynical review +- Output numbered findings with severity + confidence + +## Phase 2: Tone Transform +- task: Rewrite findings as cold engineering professional +- Preserve severity/confidence markers +- Remove inflammatory language, keep substance + +## Phase 3: Post +- Preview full comment +- Ask for explicit confirmation +- Post via gh pr comment +- Handle auth failure gracefully +``` + +### Acceptance Criteria + +- [ ] AC 1: Given a PR URL, when agent reads prompt, then it fetches PR data via `gh` without hallucinating PR numbers +- [ ] AC 2: Given PR data, when review runs, then findings are numbered with severity (🔴🟡🟢) and confidence (High/Medium/Low %) +- [ ] AC 3: Given cynical output, when tone transform runs, then language is professional but findings retain substance +- [ ] AC 4: Given transformed output, when user confirms, then comment posts to PR via `gh pr comment` +- [ ] AC 5: Given missing PR number, when agent starts, then it stops and asks user explicitly +- [ ] AC 6: Given PR from different repo, when agent detects mismatch, then it asks user to confirm before proceeding +- [ ] AC 7: Given PR with >50 files or >5000 lines, when pre-flight runs, then agent warns and asks to proceed or focus +- [ ] AC 8: Given auth failure during post, when error occurs, then review is saved locally and error is displayed loudly +- [ ] AC 9: Given PR with binary files, when fetching diff, then binaries are skipped with a note + +## Additional Context + +### Dependencies + +- `gh` CLI installed and authenticated +- Any LLM agent capable of running bash commands + +### Sandboxed Execution Rules + +The prompt MUST enforce: + +- ❌ No inferring PR from conversation history +- ❌ No looking at git branches, recent commits, or local state +- ❌ No guessing or assuming PR numbers +- ✅ Use ONLY explicit PR number/URL from user message +- ✅ If missing, STOP and ask: "What PR number or URL should I review?" + +### Severity Scale + +| Level | Meaning | +| ----------- | ------------------------------------------------------- | +| 🔴 Critical | Security issue, data loss risk, or broken functionality | +| 🟡 Moderate | Bug, performance issue, or significant code smell | +| 🟢 Minor | Style, naming, minor improvement opportunity | + +### Confidence Scale + +| Level | Meaning | +| --------------- | ------------------------------------ | +| High (>80%) | Definitely an issue | +| Medium (40-80%) | Likely an issue, might need context | +| Low (<40%) | Possible issue, could be intentional | + +### Example Output Format + +```markdown +## PR Review: #1234 + +### 1. Unbounded query in user search + +**Severity:** 🔴 Critical | **Confidence:** High (>80%) + +The search endpoint at `src/api/search.ts:47` doesn't limit results, which could return thousands of rows and cause memory issues. + +**Suggestion:** Add `.limit(100)` or implement pagination. + +### 2. Missing null check in callback + +**Severity:** 🟡 Moderate | **Confidence:** Medium (40-80%) + +The callback at `src/handlers/webhook.ts:23` could be undefined if the event type is unregistered. + +**Suggestion:** Add defensive check: `if (callback) callback(event)` + +--- + +_Review generated by Raven's Verdict - Deep PR Review Tool_ +``` + +### Notes + +- The "cynical asshole" phase is internal only - never posted +- Tone transform must happen before any external output +- When in doubt, ask the user - never assume +- This is a POC - iterate based on real usage diff --git a/tools/maintainer/pr-review/README.md b/tools/maintainer/pr-review/README.md new file mode 100644 index 00000000..1e20b484 --- /dev/null +++ b/tools/maintainer/pr-review/README.md @@ -0,0 +1,19 @@ +# Raven's Verdict - Deep PR Review Tool + +Adversarial code review for GitHub PRs. Works with any LLM agent. + +## Prerequisites + +- `gh` CLI installed and authenticated (`gh auth status`) +- Any LLM agent capable of running bash commands + +## Usage + +```bash +# Claude Code +claude "Review PR #123 using tools/maintainer/pr-review/review-prompt.md" + +# Other agents: copy review-prompt.md contents to your agent +``` + +See `review-prompt.md` for full details on output format, safety features, and how it works. diff --git a/tools/maintainer/pr-review/review-prompt.md b/tools/maintainer/pr-review/review-prompt.md new file mode 100644 index 00000000..c00f779d --- /dev/null +++ b/tools/maintainer/pr-review/review-prompt.md @@ -0,0 +1,237 @@ +# Raven's Verdict - Deep PR Review Tool + +A cynical adversarial review, transformed into cold engineering professionalism. + + +CRITICAL: Sandboxed Execution Rules + +Before proceeding, you MUST verify: + +- [ ] PR number or URL was EXPLICITLY provided in the user's message +- [ ] You are NOT inferring the PR from conversation history +- [ ] You are NOT looking at git branches, recent commits, or local state +- [ ] You are NOT guessing or assuming any PR numbers + +**If no explicit PR number/URL was provided, STOP immediately and ask:** +"What PR number or URL should I review?" + + + + +## Preflight Checks + +### 0.0 Ensure Clean Checkout + +Before anything else, verify the working tree is clean and check out the PR branch. + +```bash +# Check for uncommitted changes +git status --porcelain +``` + +If output is non-empty, STOP and tell user: + +> "You have uncommitted changes. Please commit or stash them before running a PR review." + +If clean, fetch and checkout the PR branch: + +```bash +# Fetch and checkout PR branch (gh handles the remote fetch) +gh pr checkout {PR_NUMBER} +``` + +If checkout fails, STOP and report the error. + +Now you're on the PR branch with full access to all files as they exist in the PR. + +### 0.1 Parse PR Input + +Extract PR number from user input. Examples of valid formats: + +- `123` (just the number) +- `#123` (with hash) +- `https://github.com/owner/repo/pull/123` (full URL) + +If a URL specifies a different repository than the current one: + +```bash +# Check current repo +gh repo view --json nameWithOwner -q '.nameWithOwner' +``` + +If mismatch detected, ask user: + +> "This PR is from `{detected_repo}` but we're in `{current_repo}`. Proceed with reviewing `{detected_repo}#123`? (y/n)" + +### 0.2 Check PR Size + +```bash +gh pr view {PR_NUMBER} --json additions,deletions,changedFiles -q '{"additions": .additions, "deletions": .deletions, "files": .changedFiles}' +``` + +**Size thresholds:** + +| Metric | Warning Threshold | +| ------------- | ----------------- | +| Files changed | > 50 | +| Lines changed | > 5000 | + +If thresholds exceeded, ask user: + +> "This PR has {X} files and {Y} line changes. That's large. +> +> **[f] Focus** - Pick specific files or directories to review +> **[p] Proceed** - Review everything (may be slow/expensive) +> **[a] Abort** - Stop here" + +### 0.3 Note Binary Files + +```bash +gh pr diff {PR_NUMBER} --name-only | grep -E '\.(png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot|pdf|zip|tar|gz|bin|exe|dll|so|dylib)$' || echo "No binary files detected" +``` + +Store list of binary files to skip. Note them in final output. + + + + + +### 1.1 Run Cynical Review + +**INTERNAL PERSONA - Never post this directly:** + +Task: You are a cynical, jaded code reviewer with zero patience for sloppy work. This PR was submitted by a clueless weasel and you expect to find problems. Find at least five issues to fix or improve in it. Number them. Be skeptical of everything. Ultrathink. + +Output format: + +```markdown +### [NUMBER]. [FINDING TITLE] [likely] + +**Severity:** [EMOJI] [LEVEL] + +[DESCRIPTION - be specific, include file:line references] +``` + +Severity scale: + +| Level | Emoji | Meaning | +| -------- | ----- | ------------------------------------------------------- | +| Critical | 🔴 | Security issue, data loss risk, or broken functionality | +| Moderate | 🟡 | Bug, performance issue, or significant code smell | +| Minor | 🟢 | Style, naming, minor improvement opportunity | + +Likely tag: + +- Add `[likely]` to findings with high confidence, e.g. with direct evidence +- Sort findings by severity (Critical → Moderate → Minor), not by confidence + + + + + +**Transform the cynical output into cold engineering professionalism.** + +**Transformation rules:** + +1. Remove all inflammatory language, insults, assumptions about the author +2. Keep all technical substance, file references, severity ratings and likely tag +3. Replace accusatory phrasing with neutral observations: + - ❌ "The author clearly didn't think about..." + - ✅ "This implementation may not account for..." +4. Preserve skepticism as healthy engineering caution: + - ❌ "This will definitely break in production" + - ✅ "This pattern has historically caused issues in production environments" +5. Add the suggested fixes. +6. Keep suggestions actionable and specific + +Output format after transformation: + +```markdown +## PR Review: #{PR_NUMBER} + +**Title:** {PR_TITLE} +**Author:** @{AUTHOR} +**Branch:** {HEAD} → {BASE} + +--- + +### Findings + +[TRANSFORMED FINDINGS HERE] + +--- + +### Summary + +**Critical:** {COUNT} | **Moderate:** {COUNT} | **Minor:** {COUNT} + +[BINARY_FILES_NOTE if any] + +--- + +_Review generated by Raven's Verdict. LLM-produced analysis - findings may be incorrect or lack context. Verify before acting._ +``` + + + + +### 3.1 Preview + +Display the complete transformed review to the user. + +``` +══════════════════════════════════════════════════════ +PREVIEW - This will be posted to PR #{PR_NUMBER} +══════════════════════════════════════════════════════ + +[FULL REVIEW CONTENT] + +══════════════════════════════════════════════════════ +``` + +### 3.2 Confirm + +Ask user for explicit confirmation: + +> **Ready to post this review to PR #{PR_NUMBER}?** +> +> **[y] Yes** - Post as comment +> **[n] No** - Abort, do not post +> **[e] Edit** - Let me modify before posting +> **[s] Save only** - Save locally, don't post + +### 3.3 Post or Save + +**Write review to a temp file, then post:** + +1. Write the review content to a temp file with a unique name (include PR number to avoid collisions) +2. Post using `gh pr comment {PR_NUMBER} --body-file {path}` +3. Delete the temp file after successful post + +Do NOT use heredocs or `echo` - markdown code blocks will break shell parsing. Use your file writing tool instead. + +**If auth fails or post fails:** + +1. Display error prominently: + + ``` + ⚠️ FAILED TO POST REVIEW + Error: {ERROR_MESSAGE} + ``` + +2. Keep the temp file and tell the user where it is, so they can post manually with: + `gh pr comment {PR_NUMBER} --body-file {path}` + +**If save only (s):** + +Keep the temp file and inform user of location. + + + + +- The "cynical asshole" phase is internal only - never posted +- Tone transform MUST happen before any external output +- When in doubt, ask the user - never assume +- If you're unsure about severity, err toward higher severity +- If you're unsure about confidence, be honest and use Medium or Low +