cornell-zhang/heurigym

Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization (ICLR'26)

/ 100

Emerging

This project helps evaluate how effectively large language models (LLMs) can create and improve heuristics to solve complex real-world optimization challenges. It takes various combinatorial optimization problems, such as airline crew pairing or protein sequence design, and measures the quality of the heuristics generated by different LLMs. Researchers and practitioners working on applying LLMs to solve difficult optimization tasks would use this to benchmark and compare different LLM approaches.

Use this if you need a rigorous, objective way to benchmark different LLM agents' ability to solve practical, open-ended combinatorial optimization problems through code-driven interaction.

Not ideal if you are looking for an off-the-shelf solver for a specific optimization problem, or if your tasks involve simple, closed-form challenges.

Combinatorial Optimization Electronic Design Automation Computational Biology Logistics Planning Compiler Optimization

No Package No Dependents

Maintenance 10 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Compare

heurigym and AgentBench

Higher-rated alternatives

sierra-research/tau2-bench

τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment

xlang-ai/OSWorld

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

bigcode-project/bigcodebench

[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

scicode-bench/SciCode

A benchmark that challenges language models to code solutions for scientific problems

Explore LLM Tools

All categories Trending LLM Tool directory Insights