ExplainableML/Vision_by_Language
[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"
This project helps anyone who needs to find specific images based on a starting image and a textual description of changes. You provide an existing image and a text modification (like "add a hat" or "without people and at night-time"), and it searches a database to retrieve relevant images that match your modified description. It's ideal for image database curators, content creators, or researchers who need precise image retrieval without extensive training.
No commits in the last 6 months.
Use this if you need to perform advanced image searches by combining an existing image with text descriptions to find highly specific, modified versions of that image, without needing to train a custom AI model.
Not ideal if you are looking for a simple keyword-based image search or if your primary need is basic image classification rather than complex compositional retrieval.
Stars
84
Forks
7
Language
Python
License
MIT
Category
Last pushed
Jul 04, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ExplainableML/Vision_by_Language"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice