lmms-eval and MASEval

lmms-eval provides a general-purpose multimodal evaluation framework across modalities, while MASEval specializes in evaluating multi-agent LLM systems, making them complementary tools for different evaluation scenarios rather than direct competitors.

lmms-eval
78
Verified
MASEval
55
Established
Maintenance 20/25
Adoption 11/25
Maturity 25/25
Community 22/25
Maintenance 10/25
Adoption 7/25
Maturity 22/25
Community 16/25
Stars: 3,883
Forks: 539
Downloads:
Commits (30d): 25
Language: Python
License:
Stars: 18
Forks: 7
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No risk flags
No risk flags

About lmms-eval

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

This tool helps researchers and AI practitioners reliably compare how well different multimodal AI models understand and respond to various types of real-world information. You provide an AI model and a set of diverse tasks involving text, images, video, and audio, and it outputs consistent, trustworthy performance metrics. Anyone who builds, deploys, or studies large multimodal models will find this useful for understanding model capabilities.

AI model evaluation multimodal AI machine learning research AI development model benchmarking

About MASEval

parameterlab/MASEval

Multi-Agent LLM Evaluation

This is for AI researchers and developers who need to compare how well different multi-agent LLM systems perform. It takes your existing agent implementations (from frameworks like AutoGen or LangChain) and runs them through standard benchmarks or your own custom evaluation tasks. The output helps you understand which agent architectures and configurations are most effective for specific challenges.

AI-research LLM-benchmarking agent-system-evaluation multi-agent-development AI-performance-testing

Scores updated daily from GitHub, PyPI, and npm data. How scores work