ExplainableML/Vision_by_Language

[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"

/ 100

Emerging

This project helps anyone who needs to find specific images based on a starting image and a textual description of changes. You provide an existing image and a text modification (like "add a hat" or "without people and at night-time"), and it searches a database to retrieve relevant images that match your modified description. It's ideal for image database curators, content creators, or researchers who need precise image retrieval without extensive training.

No commits in the last 6 months.

Use this if you need to perform advanced image searches by combining an existing image with text descriptions to find highly specific, modified versions of that image, without needing to train a custom AI model.

Not ideal if you are looking for a simple keyword-based image search or if your primary need is basic image classification rather than complex compositional retrieval.

image-search digital-asset-management content-creation visual-content-curation database-retrieval

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights