EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
This tool helps researchers and AI practitioners reliably compare how well different multimodal AI models understand and respond to various types of real-world information. You provide an AI model and a set of diverse tasks involving text, images, video, and audio, and it outputs consistent, trustworthy performance metrics. Anyone who builds, deploys, or studies large multimodal models will find this useful for understanding model capabilities.
3,883 stars. Used by 1 other package. Actively maintained with 25 commits in the last 30 days. Available on PyPI.
Use this if you need to rigorously and reproducibly evaluate the performance of multimodal AI models across a wide range of tasks involving different data types.
Not ideal if you are looking for a simple, single-metric benchmark for a single data type or if you are not working with advanced AI models.
Stars
3,883
Forks
539
Language
Python
License
—
Category
Last pushed
Mar 11, 2026
Commits (30d)
25
Dependencies
52
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/EvolvingLMMs-Lab/lmms-eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Related tools
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents
evalplus/evalplus
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024