FudanDISC/ReForm-Eval

An benchmark for evaluating the capabilities of large vision-language models (LVLMs)

31
/ 100
Emerging

This project helps AI researchers and developers thoroughly evaluate how well large vision-language models (LVLMs) understand and reason about images and text. It takes existing multimodal benchmark datasets and converts them into a standardized format (multiple-choice or text generation problems). The output is a detailed quantitative analysis of an LVLM's performance across a wide range of visual and reasoning tasks, helping developers identify strengths and weaknesses.

No commits in the last 6 months.

Use this if you are developing large vision-language models and need a comprehensive, quantitative, and standardized way to benchmark their capabilities across various visual and language understanding tasks.

Not ideal if you are an end-user looking for an application of an LVLM, rather than a developer needing to evaluate the models themselves.

AI model evaluation large vision-language models benchmark datasets multimodal AI model performance analysis
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 7 / 25

How are scores calculated?

Stars

46

Forks

3

Language

Python

License

Apache-2.0

Last pushed

Nov 17, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/FudanDISC/ReForm-Eval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.