open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

69
/ 100
Established

This toolkit helps AI researchers and developers systematically evaluate the performance of large vision-language models (LVLMs). You input an LVLM and a benchmark dataset, and it outputs detailed evaluation results on how well the model performs on visual understanding and reasoning tasks. It's designed for those who need to compare and benchmark different LVLMs efficiently.

3,894 stars. Actively maintained with 27 commits in the last 30 days.

Use this if you need to quickly and comprehensively assess the capabilities of various large vision-language models across a wide range of benchmarks without extensive manual setup.

Not ideal if you are looking for a tool to train or fine-tune vision-language models, as its primary purpose is evaluation.

AI research model evaluation computer vision natural language processing multimodal AI
No Package No Dependents
Maintenance 20 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 23 / 25

How are scores calculated?

Stars

3,894

Forks

650

Language

Python

License

Apache-2.0

Last pushed

Mar 12, 2026

Commits (30d)

27

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/open-compass/VLMEvalKit"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.