domaineval/DomainEval

DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference code and tests) covering six domains (i.e., Computation, Basic, Network, Cryptography, Visualization, System).

27
/ 100
Experimental

This project helps evaluate how well AI models can generate code across various programming tasks, from basic computations to network operations, cryptography, and visualization. It takes in a code generation AI model's output and evaluates its accuracy against a diverse set of real-world coding problems. Software engineers and researchers developing or using large language models for code generation would use this to benchmark performance.

No commits in the last 6 months.

Use this if you need to rigorously test and compare the code generation capabilities of different AI models across a wide range of programming domains.

Not ideal if you are looking for a tool to generate code directly for your own applications, as this is solely for benchmarking existing models.

code-generation LLM-benchmarking software-engineering AI-model-evaluation programming-task-assessment
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 14 / 25

How are scores calculated?

Stars

14

Forks

3

Language

Python

License

Last pushed

Dec 12, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ai-coding/domaineval/DomainEval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.