ylsung/VL_adapter

PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)

39
/ 100
Emerging

This project helps machine learning engineers or researchers efficiently adapt large pre-trained vision-and-language models for new image-text or video-text tasks. It takes existing models like VL-T5 or VL-BART along with your specific dataset (e.g., VQAv2, MSCOCO, TVQA), and outputs a specialized model that performs well on your task with significantly fewer parameters to train. This is ideal for those working on multimodal AI applications.

210 stars. No commits in the last 6 months.

Use this if you need to fine-tune large vision-and-language models for new downstream tasks without the computational cost of training all model parameters.

Not ideal if you are looking for a ready-to-use, off-the-shelf application and are not comfortable with model training and script execution.

multimodal-ai vision-language-models transfer-learning natural-language-processing computer-vision
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

210

Forks

17

Language

Python

License

MIT

Last pushed

Dec 18, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ylsung/VL_adapter"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.