astra-vision/LatteCLIP

[WACV 2025] LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts

/ 100

Experimental

This project helps machine learning engineers and researchers improve the performance of vision-language models, specifically CLIP, on custom image datasets without needing manual data labeling. It takes your unlabelled image datasets and automatically generates descriptive texts using advanced language models. The output is a fine-tuned CLIP model that understands your specific images better, ready for tasks like image classification or search.

No commits in the last 6 months.

Use this if you have a unique image dataset and want to boost a CLIP model's accuracy on it, but lack the resources for extensive manual text labeling.

Not ideal if you don't work with deep learning models or don't have access to substantial GPU compute resources.

computer-vision machine-learning-engineering image-recognition model-fine-tuning unsupervised-learning

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights