RAIVNLab/sugar-crepe

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

/ 100

Emerging

This project provides a robust way to evaluate how well AI models understand the relationship between images and descriptive text. It takes an image and a set of captions, including very similar 'trick' captions, and assesses if the model can consistently identify the single correct description. AI researchers and developers working on vision-language models will use this to accurately gauge their models' compositional understanding.

No commits in the last 6 months.

Use this if you need a reliable and unbiased benchmark to test the compositional understanding of your vision-language AI models.

Not ideal if you are looking for a tool to train models or a general-purpose image captioning solution.

AI model evaluation vision-language understanding compositional AI benchmark dataset model interpretability

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

Westlake-AI/openmixup

CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark

YU1ut/MixMatch-pytorch

Code for "MixMatch - A Holistic Approach to Semi-Supervised Learning"

kamata1729/QATM_pytorch

Pytorch Implementation of QATM:Quality-Aware Template Matching For Deep Learning

nttcslab/msm-mae

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations

rgeirhos/generalisation-humans-DNNs

Data, code & materials from the paper "Generalisation in humans and deep neural networks" (NeurIPS 2018)

Explore ML Frameworks

All categories Trending ML Framework directory Insights