jingyi0000/VLM_survey

Collection of AWESOME vision-language models for vision tasks

39
/ 100
Emerging

This project is a curated list of research papers and associated code for Vision-Language Models (VLMs) focused on computer vision tasks like image classification and object detection. It helps AI researchers and practitioners stay current with the latest advancements in how language understanding can enhance visual recognition systems. The input is academic papers and project code, and the output is an organized, up-to-date collection of resources on VLMs.

3,094 stars. No commits in the last 6 months.

Use this if you are an AI researcher, computer vision engineer, or machine learning practitioner looking for a comprehensive overview and resources on the latest Vision-Language Models for various visual recognition tasks.

Not ideal if you are looking for an off-the-shelf tool or software to directly apply VLMs without diving into research papers and codebases.

Computer Vision Natural Language Processing AI Research Machine Learning Engineering Deep Learning
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 19 / 25

How are scores calculated?

Stars

3,094

Forks

233

Language

License

Last pushed

Oct 14, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/jingyi0000/VLM_survey"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.