jszheng21/RACE

RACE is a multi-dimensional benchmark for code generation that focuses on Readability, mAintainability, Correctness, and Efficiency.

/ 100

Experimental

RACE helps evaluate how well large language models (LLMs) generate computer code. It takes code generated by an LLM and assesses it across multiple dimensions like readability, maintainability, correctness, and efficiency. The output is a detailed report on the code's quality, which can be used by AI researchers and developers to compare and improve code generation models.

No commits in the last 6 months.

Use this if you are developing or comparing large language models and need a comprehensive way to benchmark their ability to produce high-quality, practical code.

Not ideal if you are an end-user simply looking to use an LLM for code generation without needing to benchmark its underlying performance characteristics.

AI model evaluation code generation LLM benchmarking software engineering AI research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

k4black/codebleu

Pip compatible CodeBLEU metric implementation available for linux/macos/win

LiveCodeBench/LiveCodeBench

Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of...

EdinburghNLP/code-docstring-corpus

Preprocessed Python functions and docstrings for automated code documentation (code2doc) and...

hendrycks/apps

APPS: Automated Programming Progress Standard (NeurIPS 2021)

solis-team/Hydra

[FSE 2026] Do Not Treat Code as Natural Language: Implications for Repository-Level Code...

Explore AI Coding Tools

All categories Trending AI Coding directory Insights