HWH-2000/DynaCode

[ACL'2025 Findings] DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation

14
/ 100
Experimental

This project helps researchers and developers evaluate how well large language models (LLMs) generate computer code. You input an LLM and a set of coding problems, and it outputs a performance score (Pass@1) that shows how accurately the LLM generates correct and robust code. It's designed for AI researchers, machine learning engineers, and data scientists working on improving or comparing LLMs for code generation tasks.

No commits in the last 6 months.

Use this if you need to rigorously test and understand the strengths and weaknesses of different LLMs when they generate code, especially concerning code complexity and nested logic.

Not ideal if you are looking for a tool to help you write code yourself, or to evaluate the performance of human programmers.

LLM evaluation code generation benchmarking AI model performance computational linguistics AI research
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 5 / 25
Maturity 7 / 25
Community 0 / 25

How are scores calculated?

Stars

10

Forks

Language

Python

License

Last pushed

Jul 10, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ai-coding/HWH-2000/DynaCode"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.