lmms-eval and mlmm-evaluation

lmms-eval
78
Verified
mlmm-evaluation
42
Emerging
Maintenance 20/25
Adoption 11/25
Maturity 25/25
Community 22/25
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 16/25
Stars: 3,883
Forks: 539
Downloads:
Commits (30d): 22
Language: Python
License:
Stars: 132
Forks: 18
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No risk flags
Stale 6m No Package No Dependents

About lmms-eval

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

This tool helps researchers and AI practitioners reliably compare how well different multimodal AI models understand and respond to various types of real-world information. You provide an AI model and a set of diverse tasks involving text, images, video, and audio, and it outputs consistent, trustworthy performance metrics. Anyone who builds, deploys, or studies large multimodal models will find this useful for understanding model capabilities.

AI model evaluation multimodal AI machine learning research AI development model benchmarking

About mlmm-evaluation

nlp-uoregon/mlmm-evaluation

Multilingual Large Language Models Evaluation Benchmark

This evaluation framework helps AI researchers and developers assess how well their multilingual large language models (LLMs) understand and answer questions across different languages. You input an LLM and it outputs performance scores on specific question-answering tasks (ARC, HellaSwag, MMLU) in 26 languages, showing how effectively your model generalizes beyond English. This is for researchers building or fine-tuning LLMs for global applications.

AI-model-evaluation multilingual-NLP LLM-benchmarking natural-language-understanding AI-research

Scores updated daily from GitHub, PyPI, and npm data. How scores work