ExplainableML/WaffleCLIP

Official repository for the ICCV 2023 paper: "Waffling around for Performance: Visual Classification with Random Words and Broad Concepts"

/ 100

Emerging

This project helps researchers and machine learning engineers improve how well AI models can identify objects in images, especially when they haven't been specifically trained on those objects. You input a set of images and a list of object categories, and the system outputs an improved classification of what's in each image. It's designed for those who work with computer vision models like CLIP and need to boost their 'zero-shot' performance.

No commits in the last 6 months.

Use this if you are a machine learning researcher or engineer working with Vision-Language Models (VLMs) like CLIP and need to improve their classification accuracy on new, unseen categories without additional training data.

Not ideal if you are looking for a plug-and-play image classification tool for a production environment without delving into research-focused model enhancements.

computer-vision zero-shot-learning image-classification deep-learning-research multimodal-ai

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights