zer0int/CLIP-fine-tune-registers-gated

Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!

29
/ 100
Experimental

This project offers an improved version of CLIP, a powerful AI model that connects images and text. It takes your existing CLIP models and enhances their ability to understand and match visual content with descriptions, producing more accurate image search results and better visual feature extraction. Data scientists, machine learning engineers, and AI developers can use this for better vision-language tasks.

No commits in the last 6 months.

Use this if you need a vision-language model with a lower 'modality gap' for more precise image-text retrieval and enhanced visual understanding.

Not ideal if your primary goal is zero-shot accuracy or generic text encoders for generative AI without specific improvements to image-text alignment, in which case a classic CLIP fine-tune might suffice.

image-retrieval computer-vision natural-language-processing multimodal-ai deep-learning
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 3 / 25

How are scores calculated?

Stars

47

Forks

1

Language

Python

License

MIT

Last pushed

Jun 02, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zer0int/CLIP-fine-tune-registers-gated"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.