lt-asset/REPOCOD

For our ACL25 Paper: Can Language Models Replace Programmers? RepoCod Says ‘Not Yet’ - by Shanchao Liang and Yiran Hu and Nan Jiang and Lin Tan

/ 100

Emerging

REPOCOD is a specialized benchmark designed to assess how well large language models can generate code for real-world software projects. It takes code generation models as input and outputs their performance scores, specifically evaluating their ability to handle complex, multi-file programming tasks. This tool is for AI researchers and developers who are building or evaluating advanced code-generating AI.

No commits in the last 6 months.

Use this if you are developing or evaluating large language models intended for complex, real-world software development tasks that require understanding across multiple code files.

Not ideal if you are looking for a simple benchmark for basic, single-file code generation problems, or if you are not working on advanced LLM development.

AI research code generation software engineering language model evaluation LLM development

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

BSD-3-Clause

Higher-rated alternatives

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal...

pat-jj/DeepRetrieval

[COLM’25] DeepRetrieval — 🔥 Training Search Agent by RLVR with Retrieval Outcome

lupantech/MathVista

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

x66ccff/liveideabench

[𝐍𝐚𝐭𝐮𝐫𝐞 𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐜𝐚𝐭𝐢𝐨𝐧𝐬] 🤖💡 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea...

ise-uiuc/magicoder

[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct

Explore LLM Tools

All categories Trending LLM Tool directory Insights