--- license: mit --- # Precise Debugging Benchmarking (PDB) ๐Ÿ“„ [Paper](https://arxiv.org/abs/2604.17338)  ยท  ๐Ÿ’ป [Code](https://github.com/Bill1235813/PDB)  ยท  ๐ŸŒ [Project page](https://precise-debugging-benchmark.github.io/)  ยท  ๐Ÿ† [Leaderboard](https://precise-debugging-benchmark.github.io/leaderboard.html) **PDB** is an automatic pipeline that turns any coding dataset into a *debugging* benchmark with fine-grained metrics. Beyond binary unit-test scores, PDB evaluates a debugger with **edit-level precision** (did the model touch only the lines it had to?) and **bug-level recall** (did it fix every fault?). This rewards targeted fixes and penalizes the regeneration behavior frontier LLMs often fall back on. > Frontier models like GPT-5.1-Codex and DeepSeek-V3.2-Thinking top unit-test > leaderboards (>76%) but score at or below 45% on precision: they pass tests > by rewriting, not repairing. PDB makes that gap measurable. ## Released datasets | Dataset | Size | Bug granularity | Notes | |---|---|---|---| | [PDB-Single](https://huggingface.co/datasets/Precise-Debugging-Benchmarking/PDB-Single) | 7,589 | single line | full initial pool before easy-case filtering | | [PDB-Single-Hard](https://huggingface.co/datasets/Precise-Debugging-Benchmarking/PDB-Single-Hard) | 5,751 | single line | hard subset: tasks not easily solved by 7+ of 9 reference models | | [PDB-Multi](https://huggingface.co/datasets/Precise-Debugging-Benchmarking/PDB-Multi) | 256 | 2–4 line blocks | multi-line extension on programs with โ‰ฅ35 LOC; atomicity-filtered | All three are derived from [BigCodeBench](https://huggingface.co/datasets/bigcode/bigcodebench) and [LiveCodeBench](https://huggingface.co/datasets/livecodebench/execution), sourced via the PDB pipeline, and evaluated with precision / recall / unit-test pass rate. ## Citation ``` @inproceedings{zhu2026pdb, title = {Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?}, author = {Zhu, Wang Bill and Chai, Miaosen and Wang, Shangshang and Liu, Yejia and Bian, Song and Dong, Honghua and Neiswanger, Willie and Jia, Robin}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2026}, year = {2026}, } ``` ## Contact Questions / submissions: wangzhu@usc.edu, miaosenc@usc.edu.