JinjieNi/MixEval-X

The official github repo for MixEval-X, the first any-to-any, real-world benchmark.

/ 100

Experimental

This project helps AI researchers and developers accurately compare the real-world performance of different large AI models, especially those capable of handling various types of input and output. It takes the model's responses to diverse prompts (like images, videos, audio, or text) and outputs a comprehensive score that reflects how well the model performs on real-world tasks. The primary users are researchers and engineers developing or evaluating large multimodal models.

No commits in the last 6 months.

Use this if you need a standardized, comprehensive, and efficient way to benchmark the real-world performance of your multimodal AI models against a diverse set of tasks and modalities.

Not ideal if you are looking for a tool to train models or if your primary focus is on single-modality evaluations without a need for real-world, multimodal task distributions.

AI model evaluation Multimodal AI Benchmarking Generative AI Model comparison

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights