lmms-eval and mlmm-evaluation

lmms-eval

Verified

mlmm-evaluation

Emerging

Maintenance 20/25

Adoption 11/25

Maturity 25/25

Community 22/25

Maintenance 0/25

Adoption 10/25

Maturity 16/25

Community 16/25

Stars: 3,883

Forks: 539

Downloads: —

Commits (30d): 22

Language: Python

License: —

Stars: 132

Forks: 18

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No risk flags

Stale 6m No Package No Dependents

About lmms-eval

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

This tool helps researchers and AI practitioners reliably compare how well different multimodal AI models understand and respond to various types of real-world information. You provide an AI model and a set of diverse tasks involving text, images, video, and audio, and it outputs consistent, trustworthy performance metrics. Anyone who builds, deploys, or studies large multimodal models will find this useful for understanding model capabilities.

AI model evaluation multimodal AI machine learning research AI development model benchmarking

About mlmm-evaluation

nlp-uoregon/mlmm-evaluation

Multilingual Large Language Models Evaluation Benchmark

This evaluation framework helps AI researchers and developers assess how well their multilingual large language models (LLMs) understand and answer questions across different languages. You input an LLM and it outputs performance scores on specific question-answering tasks (ARC, HellaSwag, MMLU) in 26 languages, showing how effectively your model generalizes beyond English. This is for researchers building or fine-tuning LLMs for global applications.

AI-model-evaluation multilingual-NLP LLM-benchmarking natural-language-understanding AI-research

Related comparisons

lmms-eval and VLMEvalKit lmms-eval and evaluation-guidebook lmms-eval and LLMEvaluation

Scores updated daily from GitHub, PyPI, and npm data. How scores work