RAIVNLab/sugar-crepe

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

38
/ 100
Emerging

This project provides a robust way to evaluate how well AI models understand the relationship between images and descriptive text. It takes an image and a set of captions, including very similar 'trick' captions, and assesses if the model can consistently identify the single correct description. AI researchers and developers working on vision-language models will use this to accurately gauge their models' compositional understanding.

No commits in the last 6 months.

Use this if you need a reliable and unbiased benchmark to test the compositional understanding of your vision-language AI models.

Not ideal if you are looking for a tool to train models or a general-purpose image captioning solution.

AI model evaluation vision-language understanding compositional AI benchmark dataset model interpretability
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

89

Forks

10

Language

Python

License

MIT

Last pushed

Feb 13, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/RAIVNLab/sugar-crepe"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.