domaineval/DomainEval
DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference code and tests) covering six domains (i.e., Computation, Basic, Network, Cryptography, Visualization, System).
This project helps evaluate how well AI models can generate code across various programming tasks, from basic computations to network operations, cryptography, and visualization. It takes in a code generation AI model's output and evaluates its accuracy against a diverse set of real-world coding problems. Software engineers and researchers developing or using large language models for code generation would use this to benchmark performance.
No commits in the last 6 months.
Use this if you need to rigorously test and compare the code generation capabilities of different AI models across a wide range of programming domains.
Not ideal if you are looking for a tool to generate code directly for your own applications, as this is solely for benchmarking existing models.
Stars
14
Forks
3
Language
Python
License
—
Category
Last pushed
Dec 12, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ai-coding/domaineval/DomainEval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
k4black/codebleu
Pip compatible CodeBLEU metric implementation available for linux/macos/win
LiveCodeBench/LiveCodeBench
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of...
EdinburghNLP/code-docstring-corpus
Preprocessed Python functions and docstrings for automated code documentation (code2doc) and...
hendrycks/apps
APPS: Automated Programming Progress Standard (NeurIPS 2021)
solis-team/Hydra
[FSE 2026] Do Not Treat Code as Natural Language: Implications for Repository-Level Code...