| | --- |
| | title: README |
| | emoji: π₯ |
| | colorFrom: blue |
| | colorTo: purple |
| | sdk: static |
| | pinned: false |
| | --- |
| | |
| | # EvalPlus: Rigorous Evaluation of LLMs for Code Generation |
| |
|
| | ## About |
| |
|
| | EvalPlus evaluates LLM-generated code on: |
| |
|
| | * Code Correctness: HumanEval+ and MBPP+ |
| | * Code Efficiency: EvalPerf |
| |
|
| | ## Resources |
| |
|
| | * π» **GitHub Repo**: [evalplus/evalplus](https://github.com/evalplus/evalplus) |
| | * π **Leader Board**: [evalplus.github.io](https://evalplus.github.io) |
| | * π **Papers**: [EvalPlus@NeurIPS'23](https://arxiv.org/abs/2305.01210), [EvalPerf@COLM'24](https://arxiv.org/abs/2408.06450) |
| | * π **Python Package**: [PyPI](https://pypi.org/project/evalplus/) |
| |
|
| | ## Citations |
| |
|
| | ```bibtex |
| | @inproceedings{evalplus, |
| | title = {Is Your Code Generated by Chat{GPT} Really Correct? Rigorous Evaluation of Large Language Models for Code Generation}, |
| | author = {Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming}, |
| | booktitle = {Thirty-seventh Conference on Neural Information Processing Systems}, |
| | year = {2023}, |
| | url = {https://openreview.net/forum?id=1qvx610Cu7}, |
| | } |
| | |
| | @inproceedings{evalperf, |
| | title = {Evaluating Language Models for Efficient Code Generation}, |
| | author = {Liu, Jiawei and Xie, Songrun and Wang, Junhao and Wei, Yuxiang and Ding, Yifeng and Zhang, Lingming}, |
| | booktitle = {First Conference on Language Modeling}, |
| | year = {2024}, |
| | url = {https://openreview.net/forum?id=IBCBMeAhmC}, |
| | } |
| | ``` |
| |
|