TIGER-AI-Lab/GenAI-Bench

Code and Data for "GenAI Arena: An Open Evaluation Platform for Generative Models" [NeurIPS 2024]

26
/ 100
Experimental

This project helps AI researchers and developers evaluate how well large multimodal models (MLLMs) judge the quality of AI-generated content like images and videos. It takes in outputs from different generative AI models and human preference data, then provides a benchmark score for how closely the MLLM's judgments align with human choices. This is used by AI developers, researchers, and engineers building or selecting multimodal reward models.

No commits in the last 6 months.

Use this if you need to objectively benchmark and compare different MLLMs' ability to act as a 'reward model' for generative AI, aligning with human preferences.

Not ideal if you are looking to evaluate the raw generation capability of an AI model itself, rather than the judgment ability of a different AI model.

Generative AI Multimodal AI AI Model Evaluation AI Alignment Reward Modeling
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 3 / 25

How are scores calculated?

Stars

34

Forks

1

Language

Python

License

MIT

Last pushed

Sep 08, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/TIGER-AI-Lab/GenAI-Bench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.