jiayuww/SpatialEval
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
SpatialEval helps AI researchers and developers assess how well large language models (LLMs) and vision-language models (VLMs) understand spatial concepts. You feed it a model you want to test and a set of spatial reasoning questions (text, image, or both), and it outputs a performance evaluation of your model's ability to grasp spatial relationships, object positions, counting, and navigation scenarios. This is for anyone building or evaluating advanced AI models that need to interact with spatial information.
No commits in the last 6 months.
Use this if you are a researcher or AI developer working on large language models or vision-language models and need a standardized way to benchmark their spatial reasoning capabilities.
Not ideal if you are a general user looking to solve a specific business problem, as this is a research-focused benchmark tool for AI model evaluation.
Stars
59
Forks
3
Language
Python
License
—
Category
Last pushed
Jan 23, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jiayuww/SpatialEval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
eth-sri/matharena
Evaluation of LLMs on latest math competitions
tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality,...
HPAI-BSC/TuRTLe
TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)
nlp-uoregon/mlmm-evaluation
Multilingual Large Language Models Evaluation Benchmark
haesleinhuepf/human-eval-bia
Benchmarking Large Language Models for Bio-Image Analysis Code Generation