open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
This toolkit helps AI researchers and developers systematically evaluate the performance of large vision-language models (LVLMs). You input an LVLM and a benchmark dataset, and it outputs detailed evaluation results on how well the model performs on visual understanding and reasoning tasks. It's designed for those who need to compare and benchmark different LVLMs efficiently.
3,894 stars. Actively maintained with 27 commits in the last 30 days.
Use this if you need to quickly and comprehensively assess the capabilities of various large vision-language models across a wide range of benchmarks without extensive manual setup.
Not ideal if you are looking for a tool to train or fine-tune vision-language models, as its primary purpose is evaluation.
Stars
3,894
Forks
650
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
27
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/open-compass/VLMEvalKit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Compare
Related models
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
EuroEval/EuroEval
The robust European language model benchmark.
evalplus/evalplus
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
DebarghaG/proofofthought
Proof of thought : LLM-based reasoning using Z3 theorem proving with multiple backend support...
eth-sri/matharena
Evaluation of LLMs on latest math competitions