yuhui-zh15/VLMClassifier

Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)

/ 100

Experimental

This project helps machine learning researchers and practitioners understand and improve how visually-grounded language models (VLMs) perform on image classification tasks. It takes an image and a VLM as input, and helps analyze why the VLM might misclassify the image, ultimately guiding how to train VLMs to become better image classifiers. This is for AI/ML researchers, data scientists, and engineers working on computer vision and large language models.

No commits in the last 6 months.

Use this if you are developing or evaluating visually-grounded language models and want to understand their image classification limitations and how to enhance their performance.

Not ideal if you are looking for a ready-to-use, production-level image classification tool for immediate deployment.

computer-vision-research large-language-models image-classification-benchmarking model-training-optimization AI-model-evaluation

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

adambielski/siamese-triplet

Siamese and triplet networks with online pair/triplet mining in PyTorch

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video analysis

KaiyangZhou/pytorch-vsumm-reinforce

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

Explore ML Frameworks

All categories Trending ML Framework directory Insights