RAIVNLab/sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
This project provides a robust way to evaluate how well AI models understand the relationship between images and descriptive text. It takes an image and a set of captions, including very similar 'trick' captions, and assesses if the model can consistently identify the single correct description. AI researchers and developers working on vision-language models will use this to accurately gauge their models' compositional understanding.
No commits in the last 6 months.
Use this if you need a reliable and unbiased benchmark to test the compositional understanding of your vision-language AI models.
Not ideal if you are looking for a tool to train models or a general-purpose image captioning solution.
Stars
89
Forks
10
Language
Python
License
MIT
Category
Last pushed
Feb 13, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/RAIVNLab/sugar-crepe"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Westlake-AI/openmixup
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark
YU1ut/MixMatch-pytorch
Code for "MixMatch - A Holistic Approach to Semi-Supervised Learning"
kamata1729/QATM_pytorch
Pytorch Implementation of QATM:Quality-Aware Template Matching For Deep Learning
nttcslab/msm-mae
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations
rgeirhos/generalisation-humans-DNNs
Data, code & materials from the paper "Generalisation in humans and deep neural networks" (NeurIPS 2018)