LeapLabTHU/Cross-Modal-Adapter
[Pattern Recognition 2025] Cross-Modal Adapter for Vision-Language Retrieval
This project helps researchers and practitioners in AI and machine learning efficiently adapt large pre-trained vision-language models for specific retrieval tasks. By reducing the number of parameters that need fine-tuning, it takes in existing models and datasets to output a more specialized, high-performing model for tasks like image or video search using text queries. It's designed for machine learning engineers, AI researchers, and data scientists working with multimodal data.
140 stars. No commits in the last 6 months.
Use this if you need to fine-tune large pre-trained vision-language models for specific retrieval tasks but want to significantly reduce computational costs and training time.
Not ideal if you are looking for a complete end-user application for image/video retrieval, as this project focuses on the underlying model adaptation methodology.
Stars
140
Forks
12
Language
Python
License
Apache-2.0
Category
Last pushed
Aug 17, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/LeapLabTHU/Cross-Modal-Adapter"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
mlfoundations/open_clip
An open source implementation of CLIP.
noxdafox/clipspy
Python CFFI bindings for the 'C' Language Integrated Production System CLIPS
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
moein-shariatnia/OpenAI-CLIP
Simple implementation of OpenAI CLIP model in PyTorch.
BioMedIA-MBZUAI/FetalCLIP
Official repository of FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis