EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

/ 100

Verified

This tool helps researchers and AI practitioners reliably compare how well different multimodal AI models understand and respond to various types of real-world information. You provide an AI model and a set of diverse tasks involving text, images, video, and audio, and it outputs consistent, trustworthy performance metrics. Anyone who builds, deploys, or studies large multimodal models will find this useful for understanding model capabilities.

3,883 stars. Used by 1 other package. Actively maintained with 25 commits in the last 30 days. Available on PyPI.

Use this if you need to rigorously and reproducibly evaluate the performance of multimodal AI models across a wide range of tasks involving different data types.

Not ideal if you are looking for a simple, single-metric benchmark for a single data type or if you are not working with advanced AI models.

AI model evaluation multimodal AI machine learning research AI development model benchmarking

Maintenance 20 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 22 / 25

How are scores calculated?

Stars

3,883

Forks

539

Language

Python

License

—

Featured in

You're Shipping AI You Can't Measure

Compare

lmms-eval and VLMEvalKit lmms-eval and MASEval lmms-eval and evaluation-guidebook

Related tools

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

evalplus/evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Explore LLM Tools

All categories Trending LLM Tool directory Insights