ExplainableML/WaffleCLIP
Official repository for the ICCV 2023 paper: "Waffling around for Performance: Visual Classification with Random Words and Broad Concepts"
This project helps researchers and machine learning engineers improve how well AI models can identify objects in images, especially when they haven't been specifically trained on those objects. You input a set of images and a list of object categories, and the system outputs an improved classification of what's in each image. It's designed for those who work with computer vision models like CLIP and need to boost their 'zero-shot' performance.
No commits in the last 6 months.
Use this if you are a machine learning researcher or engineer working with Vision-Language Models (VLMs) like CLIP and need to improve their classification accuracy on new, unseen categories without additional training data.
Not ideal if you are looking for a plug-and-play image classification tool for a production environment without delving into research-focused model enhancements.
Stars
61
Forks
6
Language
Python
License
MIT
Category
Last pushed
Jul 08, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ExplainableML/WaffleCLIP"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice