Humanity-s-Last-Code-Exam/HLCE
(EMNLP 2025 Findings) Source Evaluation scripts for Humanity's Last Code Exam
This project helps researchers and developers evaluate the advanced code generation capabilities of large language models (LLMs). It takes in LLM-generated code solutions for extremely difficult programming competition problems and outputs a definitive evaluation of their correctness and performance. This is used by AI researchers, LLM developers, and academic institutions working on cutting-edge language models for complex coding tasks.
No commits in the last 6 months.
Use this if you need to rigorously test and benchmark advanced LLMs against the most challenging competitive programming problems from contests like ICPC World Finals and IOI.
Not ideal if you are looking to evaluate LLMs on everyday coding tasks or standard, less complex programming benchmarks.
Stars
95
Forks
7
Language
Python
License
—
Category
Last pushed
Aug 21, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Humanity-s-Last-Code-Exam/HLCE"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents