CLIP Image Embeddings Transformer Models

Tools for generating and working with CLIP image-text embeddings, including implementations, fine-tuning, and lightweight variants. Does NOT include general vision-language models, text-to-image generation, or multimodal fusion frameworks.

There are 22 clip image embeddings models tracked. The highest-rated is OFA-Sys/Chinese-CLIP at 48/100 with 5,820 stars.

Get all 22 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=clip-image-embeddings&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	OFA-Sys/Chinese-CLIP Chinese version of CLIP which achieves Chinese cross-modal retrieval and...	48	Emerging	5,820	Jupyter Notebook
2	Kaushalya/medclip A multi-modal CLIP model trained on the medical dataset ROCO	44	Emerging	151	Jupyter Notebook
3	kastalimohammed1965/CLIP-fine-tune-registers-gated Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny...	43	Emerging	5	Python
4	BUAADreamer/SPN4CIR [ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning...	35	Emerging	39	Python
5	clip-italian/clip-italian CLIP (Contrastive Language–Image Pre-training) for Italian	32	Emerging	185	Jupyter Notebook
6	zer0int/CLIP-fine-tune-registers-gated Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny...	29	Experimental	47	Python
7	Armaggheddon/ClipServe 🚀 ClipServe: A fast API server for embedding text, images, and performing...	28	Experimental	8	Python
8	kyegomez/MuonClip This repository is an open source implementation of the MuonClip strategy...	27	Experimental	17	—
9	FuxiaoLiu/DocumentCLIP [ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents	22	Experimental	16	Python
10	taherfattahi/MetaWorld-VLA-openai-clip-vit A lightweight Vision-Language-Action (VLA) baseline for MetaWorld robot-arm...	22	Experimental	3	Python
11	YUSH19883/cog-jinaai-jina-clip-v2 🖼️ Generate high-quality multimodal embeddings for text and images with Jina...	21	Experimental	—	Python
12	MuhammadAliS/CLIP PyTorch implementation of OpenAI's CLIP model for image classification,...	19	Experimental	3	Jupyter Notebook
13	corentin-ryr/CLIP-mixer Implementation of CLIP using a Mixer architecture	19	Experimental	4	Python
14	VijayPrakashReddy-k/CLIP-PACL Contrastive Language - Image Pre-training (CLIP) and Patch Aligned...	19	Experimental	3	—
15	zsxkib/cog-jinaai-jina-clip-v2 Jina CLIP v2 - Multimodal embedding model for text and images with...	18	Experimental	1	Python
16	ptmorris03/CLIPEmbedding Easy text-image embedding and similarity with pretrained CLIP in PyTorch	17	Experimental	1	Python
17	seanghay/clipsort Group images by provided labels using OpenAI/CLIP	17	Experimental	1	Python
18	SuryaAnything/V-DeClip Masked Multi-Component Gated Decomposition Architecture	17	Experimental	2	Python
19	theSohamTUmbare/CLIP-model Reimplementation of the CLIP model	15	Experimental	—	Jupyter Notebook
20	ntat/Lightweight_CLIP_model A lightweight Pytorch implementation of OpenAI's CLIP model.	13	Experimental	—	Python
21	Rakshath66/ClipFindr 🔍 A CLIP-powered image similarity finder built with Streamlit — upload a...	13	Experimental	—	Python
22	monatis/turkish-clip Embed texts in Turkish to be used with OpenAI's CLIP	12	Experimental	7	Python