jingyi0000/VLM_survey

Collection of AWESOME vision-language models for vision tasks

/ 100

Emerging

This project is a curated list of research papers and associated code for Vision-Language Models (VLMs) focused on computer vision tasks like image classification and object detection. It helps AI researchers and practitioners stay current with the latest advancements in how language understanding can enhance visual recognition systems. The input is academic papers and project code, and the output is an organized, up-to-date collection of resources on VLMs.

3,094 stars. No commits in the last 6 months.

Use this if you are an AI researcher, computer vision engineer, or machine learning practitioner looking for a comprehensive overview and resources on the latest Vision-Language Models for various visual recognition tasks.

Not ideal if you are looking for an off-the-shelf tool or software to directly apply VLMs without diving into research papers and codebases.

Computer Vision Natural Language Processing AI Research Machine Learning Engineering Deep Learning

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 19 / 25

How are scores calculated?

Stars

3,094

Forks

233

Language

—

License

—

Higher-rated alternatives

open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

adambielski/siamese-triplet

Siamese and triplet networks with online pair/triplet mining in PyTorch

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video analysis

KaiyangZhou/pytorch-vsumm-reinforce

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

Explore ML Frameworks

All categories Trending ML Framework directory Insights