← CodeVetter

Agent PR Benchmark

Status: in progress · Last updated: 2026-06-10

Generic AI review claims are noise. CodeVetter is building a public, hand-labeled set of real agent-generated diffs so catch-rate numbers mean something.

Methodology

Contribute

Have an agent PR with a known bug the reviewer missed? Open an issue with a link to the diff (no proprietary code required — sanitized excerpts welcome).

Try CodeVetter

Download for macOS · GitHub