zer0int/CLIP-fine-tune-registers-gated
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
This project offers an improved version of CLIP, a powerful AI model that connects images and text. It takes your existing CLIP models and enhances their ability to understand and match visual content with descriptions, producing more accurate image search results and better visual feature extraction. Data scientists, machine learning engineers, and AI developers can use this for better vision-language tasks.
No commits in the last 6 months.
Use this if you need a vision-language model with a lower 'modality gap' for more precise image-text retrieval and enhanced visual understanding.
Not ideal if your primary goal is zero-shot accuracy or generic text encoders for generative AI without specific improvements to image-text alignment, in which case a classic CLIP fine-tune might suffice.
Stars
47
Forks
1
Language
Python
License
MIT
Category
Last pushed
Jun 02, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zer0int/CLIP-fine-tune-registers-gated"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Kaushalya/medclip
A multi-modal CLIP model trained on the medical dataset ROCO
kastalimohammed1965/CLIP-fine-tune-registers-gated
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
BUAADreamer/SPN4CIR
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives...
clip-italian/clip-italian
CLIP (Contrastive LanguageāImage Pre-training) for Italian