lmms-eval and mlmm-evaluation
About lmms-eval
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
This tool helps researchers and AI practitioners reliably compare how well different multimodal AI models understand and respond to various types of real-world information. You provide an AI model and a set of diverse tasks involving text, images, video, and audio, and it outputs consistent, trustworthy performance metrics. Anyone who builds, deploys, or studies large multimodal models will find this useful for understanding model capabilities.
About mlmm-evaluation
nlp-uoregon/mlmm-evaluation
Multilingual Large Language Models Evaluation Benchmark
This evaluation framework helps AI researchers and developers assess how well their multilingual large language models (LLMs) understand and answer questions across different languages. You input an LLM and it outputs performance scores on specific question-answering tasks (ARC, HellaSwag, MMLU) in 26 languages, showing how effectively your model generalizes beyond English. This is for researchers building or fine-tuning LLMs for global applications.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work