jiayuww/SpatialEval

[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs

/ 100

Experimental

SpatialEval helps AI researchers and developers assess how well large language models (LLMs) and vision-language models (VLMs) understand spatial concepts. You feed it a model you want to test and a set of spatial reasoning questions (text, image, or both), and it outputs a performance evaluation of your model's ability to grasp spatial relationships, object positions, counting, and navigation scenarios. This is for anyone building or evaluating advanced AI models that need to interact with spatial information.

No commits in the last 6 months.

Use this if you are a researcher or AI developer working on large language models or vision-language models and need a standardized way to benchmark their spatial reasoning capabilities.

Not ideal if you are a general user looking to solve a specific business problem, as this is a research-focused benchmark tool for AI model evaluation.

AI-model-evaluation spatial-reasoning large-language-models vision-language-models AI-benchmarking

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

eth-sri/matharena

Evaluation of LLMs on latest math competitions

tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality,...

HPAI-BSC/TuRTLe

TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)

nlp-uoregon/mlmm-evaluation

Multilingual Large Language Models Evaluation Benchmark

haesleinhuepf/human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation

Explore Transformer Models

All categories Trending Transformer directory Insights