lmms-eval and VLMEvalKit

These are complementary evaluation frameworks that can be used together, with lmms-eval offering broader modality coverage (text, image, video, audio) while VLMEvalKit provides more extensive model and benchmark support (220+ LMMs, 80+ benchmarks), allowing practitioners to choose or combine them based on their specific evaluation priorities.

lmms-eval
78
Verified
VLMEvalKit
69
Established
Maintenance 20/25
Adoption 11/25
Maturity 25/25
Community 22/25
Maintenance 20/25
Adoption 10/25
Maturity 16/25
Community 23/25
Stars: 3,883
Forks: 539
Downloads:
Commits (30d): 25
Language: Python
License:
Stars: 3,894
Forks: 650
Downloads:
Commits (30d): 27
Language: Python
License: Apache-2.0
No risk flags
No Package No Dependents

About lmms-eval

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

This tool helps researchers and AI practitioners reliably compare how well different multimodal AI models understand and respond to various types of real-world information. You provide an AI model and a set of diverse tasks involving text, images, video, and audio, and it outputs consistent, trustworthy performance metrics. Anyone who builds, deploys, or studies large multimodal models will find this useful for understanding model capabilities.

AI model evaluation multimodal AI machine learning research AI development model benchmarking

About VLMEvalKit

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

This toolkit helps AI researchers and developers systematically evaluate the performance of large vision-language models (LVLMs). You input an LVLM and a benchmark dataset, and it outputs detailed evaluation results on how well the model performs on visual understanding and reasoning tasks. It's designed for those who need to compare and benchmark different LVLMs efficiently.

AI research model evaluation computer vision natural language processing multimodal AI

Scores updated daily from GitHub, PyPI, and npm data. How scores work