TIGER-AI-Lab/GenAI-Bench

Code and Data for "GenAI Arena: An Open Evaluation Platform for Generative Models" [NeurIPS 2024]

/ 100

Experimental

This project helps AI researchers and developers evaluate how well large multimodal models (MLLMs) judge the quality of AI-generated content like images and videos. It takes in outputs from different generative AI models and human preference data, then provides a benchmark score for how closely the MLLM's judgments align with human choices. This is used by AI developers, researchers, and engineers building or selecting multimodal reward models.

No commits in the last 6 months.

Use this if you need to objectively benchmark and compare different MLLMs' ability to act as a 'reward model' for generative AI, aligning with human preferences.

Not ideal if you are looking to evaluate the raw generation capability of an AI model itself, rather than the judgment ability of a different AI model.

Generative AI Multimodal AI AI Model Evaluation AI Alignment Reward Modeling

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

GoogleCloudPlatform/vertex-ai-samples

Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop...

neo4j-partners/hands-on-lab-neo4j-and-google

Hands on Lab for Neo4j and Google

lynnlangit/learning-cloud

Courses, sample code, articles & screencasts - AWS, Azure, & GCP

GoogleCloudPlatform/applied-ai-engineering-samples

This repository compiles code samples and notebooks demonstrating how to use Generative AI on...

streamlit/30DaysOfAI

30 Days of AI

Explore Generative AI Tools

All categories Trending Generative AI directory Insights