jingyi0000/VLM_survey
Collection of AWESOME vision-language models for vision tasks
This project is a curated list of research papers and associated code for Vision-Language Models (VLMs) focused on computer vision tasks like image classification and object detection. It helps AI researchers and practitioners stay current with the latest advancements in how language understanding can enhance visual recognition systems. The input is academic papers and project code, and the output is an organized, up-to-date collection of resources on VLMs.
3,094 stars. No commits in the last 6 months.
Use this if you are an AI researcher, computer vision engineer, or machine learning practitioner looking for a comprehensive overview and resources on the latest Vision-Language Models for various visual recognition tasks.
Not ideal if you are looking for an off-the-shelf tool or software to directly apply VLMs without diving into research papers and codebases.
Stars
3,094
Forks
233
Language
—
License
—
Category
Last pushed
Oct 14, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/jingyi0000/VLM_survey"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
KaiyangZhou/pytorch-vsumm-reinforce
Unsupervised video summarization with deep reinforcement learning (AAAI'18)