TIGER-AI-Lab/GenAI-Bench
Code and Data for "GenAI Arena: An Open Evaluation Platform for Generative Models" [NeurIPS 2024]
This project helps AI researchers and developers evaluate how well large multimodal models (MLLMs) judge the quality of AI-generated content like images and videos. It takes in outputs from different generative AI models and human preference data, then provides a benchmark score for how closely the MLLM's judgments align with human choices. This is used by AI developers, researchers, and engineers building or selecting multimodal reward models.
No commits in the last 6 months.
Use this if you need to objectively benchmark and compare different MLLMs' ability to act as a 'reward model' for generative AI, aligning with human preferences.
Not ideal if you are looking to evaluate the raw generation capability of an AI model itself, rather than the judgment ability of a different AI model.
Stars
34
Forks
1
Language
Python
License
MIT
Category
Last pushed
Sep 08, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/TIGER-AI-Lab/GenAI-Bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
GoogleCloudPlatform/vertex-ai-samples
Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop...
neo4j-partners/hands-on-lab-neo4j-and-google
Hands on Lab for Neo4j and Google
lynnlangit/learning-cloud
Courses, sample code, articles & screencasts - AWS, Azure, & GCP
GoogleCloudPlatform/applied-ai-engineering-samples
This repository compiles code samples and notebooks demonstrating how to use Generative AI on...
streamlit/30DaysOfAI
30 Days of AI