open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
This toolkit helps AI researchers and developers systematically evaluate the performance of large vision-language models (LVLMs). You input an LVLM and a benchmark dataset, and it outputs detailed evaluation results on how well the model performs on visual understanding and reasoning tasks. It's designed for those who need to compare and benchmark different LVLMs efficiently.
3,894 stars. Actively maintained with 27 commits in the last 30 days.
Use this if you need to quickly and comprehensively assess the capabilities of various large vision-language models across a wide range of benchmarks without extensive manual setup.
Not ideal if you are looking for a tool to train or fine-tune vision-language models, as its primary purpose is evaluation.
Stars
3,894
Forks
650
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
27
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/open-compass/VLMEvalKit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Compare
Related tools
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents
evalplus/evalplus
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024