lt-asset/REPOCOD

For our ACL25 Paper: Can Language Models Replace Programmers? RepoCod Says ‘Not Yet’ - by Shanchao Liang and Yiran Hu and Nan Jiang and Lin Tan

35
/ 100
Emerging

REPOCOD is a specialized benchmark designed to assess how well large language models can generate code for real-world software projects. It takes code generation models as input and outputs their performance scores, specifically evaluating their ability to handle complex, multi-file programming tasks. This tool is for AI researchers and developers who are building or evaluating advanced code-generating AI.

No commits in the last 6 months.

Use this if you are developing or evaluating large language models intended for complex, real-world software development tasks that require understanding across multiple code files.

Not ideal if you are looking for a simple benchmark for basic, single-file code generation problems, or if you are not working on advanced LLM development.

AI research code generation software engineering language model evaluation LLM development
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 10 / 25

How are scores calculated?

Stars

26

Forks

3

Language

Python

License

BSD-3-Clause

Last pushed

Aug 27, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/lt-asset/REPOCOD"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.