Humanity-s-Last-Code-Exam/HLCE

(EMNLP 2025 Findings) Source Evaluation scripts for Humanity's Last Code Exam

28
/ 100
Experimental

This project helps researchers and developers evaluate the advanced code generation capabilities of large language models (LLMs). It takes in LLM-generated code solutions for extremely difficult programming competition problems and outputs a definitive evaluation of their correctness and performance. This is used by AI researchers, LLM developers, and academic institutions working on cutting-edge language models for complex coding tasks.

No commits in the last 6 months.

Use this if you need to rigorously test and benchmark advanced LLMs against the most challenging competitive programming problems from contests like ICPC World Finals and IOI.

Not ideal if you are looking to evaluate LLMs on everyday coding tasks or standard, less complex programming benchmarks.

LLM evaluation code generation benchmarking AI research competitive programming language model development
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 9 / 25
Maturity 7 / 25
Community 10 / 25

How are scores calculated?

Stars

95

Forks

7

Language

Python

License

Last pushed

Aug 21, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Humanity-s-Last-Code-Exam/HLCE"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.