open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

/ 100

Established

This toolkit helps AI researchers and developers systematically evaluate the performance of large vision-language models (LVLMs). You input an LVLM and a benchmark dataset, and it outputs detailed evaluation results on how well the model performs on visual understanding and reasoning tasks. It's designed for those who need to compare and benchmark different LVLMs efficiently.

3,894 stars. Actively maintained with 27 commits in the last 30 days.

Use this if you need to quickly and comprehensively assess the capabilities of various large vision-language models across a wide range of benchmarks without extensive manual setup.

Not ideal if you are looking for a tool to train or fine-tune vision-language models, as its primary purpose is evaluation.

AI research model evaluation computer vision natural language processing multimodal AI

No Package No Dependents

Maintenance 20 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 23 / 25

How are scores calculated?

Stars

3,894

Forks

650

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Compare

VLMEvalKit and lmms-eval

Related models

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

EuroEval/EuroEval

The robust European language model benchmark.

evalplus/evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

DebarghaG/proofofthought

Proof of thought : LLM-based reasoning using Z3 theorem proving with multiple backend support...

eth-sri/matharena

Evaluation of LLMs on latest math competitions

Explore Transformer Models

All categories Trending Transformer directory Insights