zer0int/CLIP-fine-tune-registers-gated

Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!

/ 100

Experimental

This project offers an improved version of CLIP, a powerful AI model that connects images and text. It takes your existing CLIP models and enhances their ability to understand and match visual content with descriptions, producing more accurate image search results and better visual feature extraction. Data scientists, machine learning engineers, and AI developers can use this for better vision-language tasks.

No commits in the last 6 months.

Use this if you need a vision-language model with a lower 'modality gap' for more precise image-text retrieval and enhanced visual understanding.

Not ideal if your primary goal is zero-shot accuracy or generic text encoders for generative AI without specific improvements to image-text alignment, in which case a classic CLIP fine-tune might suffice.

image-retrieval computer-vision natural-language-processing multimodal-ai deep-learning

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Kaushalya/medclip

A multi-modal CLIP model trained on the medical dataset ROCO

kastalimohammed1965/CLIP-fine-tune-registers-gated

Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!

BUAADreamer/SPN4CIR

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives...

clip-italian/clip-italian

CLIP (Contrastive Language–Image Pre-training) for Italian

Explore Transformer Models

All categories Trending Transformer directory Insights