nl4opt/ORQA

[AAAI 2025] ORQA is a new QA benchmark designed to assess the reasoning capabilities of LLMs in a specialized technical domain of Operations Research. The benchmark evaluates whether LLMs can emulate the knowledge and reasoning skills of OR experts when presented with complex optimization modeling tasks.

23
/ 100
Experimental

This benchmark helps evaluate how well large language models (LLMs) understand and apply complex optimization concepts found in Operations Research. It takes real-world optimization problem descriptions and related questions as input, then assesses if an LLM can correctly identify model components and reasoning. Anyone working on developing or deploying LLMs for technical problem-solving, particularly in supply chain, logistics, or resource allocation, would use this to gauge their model's expertise.

No commits in the last 6 months.

Use this if you need to objectively measure a large language model's ability to reason through and solve problems in the specialized domain of Operations Research.

Not ideal if you are looking for a tool to solve an Operations Research problem directly, as this is a benchmark for evaluating LLMs, not an OR solver.

Operations Research Optimization Modeling LLM Evaluation AI Reasoning Assessment Complex Problem Solving
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 5 / 25

How are scores calculated?

Stars

45

Forks

2

Language

Python

License

Last pushed

Jun 07, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/nl4opt/ORQA"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.