yuhui-zh15/VLMClassifier

Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)

25
/ 100
Experimental

This project helps machine learning researchers and practitioners understand and improve how visually-grounded language models (VLMs) perform on image classification tasks. It takes an image and a VLM as input, and helps analyze why the VLM might misclassify the image, ultimately guiding how to train VLMs to become better image classifiers. This is for AI/ML researchers, data scientists, and engineers working on computer vision and large language models.

No commits in the last 6 months.

Use this if you are developing or evaluating visually-grounded language models and want to understand their image classification limitations and how to enhance their performance.

Not ideal if you are looking for a ready-to-use, production-level image classification tool for immediate deployment.

computer-vision-research large-language-models image-classification-benchmarking model-training-optimization AI-model-evaluation
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 8 / 25

How are scores calculated?

Stars

97

Forks

5

Language

Jupyter Notebook

License

Last pushed

Oct 19, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/yuhui-zh15/VLMClassifier"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.