LeapLabTHU/Cross-Modal-Adapter

[Pattern Recognition 2025] Cross-Modal Adapter for Vision-Language Retrieval

/ 100

Emerging

This project helps researchers and practitioners in AI and machine learning efficiently adapt large pre-trained vision-language models for specific retrieval tasks. By reducing the number of parameters that need fine-tuning, it takes in existing models and datasets to output a more specialized, high-performing model for tasks like image or video search using text queries. It's designed for machine learning engineers, AI researchers, and data scientists working with multimodal data.

140 stars. No commits in the last 6 months.

Use this if you need to fine-tune large pre-trained vision-language models for specific retrieval tasks but want to significantly reduce computational costs and training time.

Not ideal if you are looking for a complete end-user application for image/video retrieval, as this project focuses on the underlying model adaptation methodology.

vision-language modeling information retrieval multimodal AI model adaptation deep learning optimization

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

140

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

mlfoundations/open_clip

An open source implementation of CLIP.

noxdafox/clipspy

Python CFFI bindings for the 'C' Language Integrated Production System CLIPS

openai/CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

moein-shariatnia/OpenAI-CLIP

Simple implementation of OpenAI CLIP model in PyTorch.

BioMedIA-MBZUAI/FetalCLIP

Official repository of FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

Explore ML Frameworks

All categories Trending ML Framework directory Insights