HWH-2000/DynaCode

[ACL'2025 Findings] DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation

/ 100

Experimental

This project helps researchers and developers evaluate how well large language models (LLMs) generate computer code. You input an LLM and a set of coding problems, and it outputs a performance score (Pass@1) that shows how accurately the LLM generates correct and robust code. It's designed for AI researchers, machine learning engineers, and data scientists working on improving or comparing LLMs for code generation tasks.

No commits in the last 6 months.

Use this if you need to rigorously test and understand the strengths and weaknesses of different LLMs when they generate code, especially concerning code complexity and nested logic.

Not ideal if you are looking for a tool to help you write code yourself, or to evaluate the performance of human programmers.

LLM evaluation code generation benchmarking AI model performance computational linguistics AI research

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 7 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

k4black/codebleu

Pip compatible CodeBLEU metric implementation available for linux/macos/win

LiveCodeBench/LiveCodeBench

Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of...

EdinburghNLP/code-docstring-corpus

Preprocessed Python functions and docstrings for automated code documentation (code2doc) and...

hendrycks/apps

APPS: Automated Programming Progress Standard (NeurIPS 2021)

solis-team/Hydra

[FSE 2026] Do Not Treat Code as Natural Language: Implications for Repository-Level Code...

Explore AI Coding Tools

All categories Trending AI Coding directory Insights