SufyanDanish/VLM-Survey-

A comprehensive survey of Vision–Language Models: Pretrained models, fine-tuning, prompt engineering, adapters, and benchmark datasets

/ 100

Experimental

This resource collects and organizes published research on Vision-Language Models (VLMs), which are AI systems that understand both images and text. It provides an overview of various techniques like fine-tuning, prompt engineering, and adapter modules to improve VLM performance. Researchers and practitioners in AI and machine learning fields would use this to understand current trends and challenges in optimizing VLMs for real-world applications such as image captioning, visual question answering, and multimodal retrieval.

No commits in the last 6 months.

Use this if you are a researcher or AI practitioner looking for a consolidated reference of techniques and models to optimize Vision-Language Models for specific tasks, especially focusing on computational efficiency and performance.

Not ideal if you are looking for an implementation-ready library or a step-by-step tutorial for building your own VLM from scratch.

AI research Multimodal AI Machine learning optimization Computer vision Natural language processing

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

—

License

—

Higher-rated alternatives

ShiZhengyan/PowerfulPromptFT

[NeurIPS 2023 Main Track] This is the repository for the paper titled "Don’t Stop Pretraining?...

OpenDriveLab/DriveLM

[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering

MILVLG/prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for...

deepankar27/Prompt_Organizer

Managed Prompt Engineering

mala-lab/NegPrompt

The official implementation of CVPR 24' Paper "Learning Transferable Negative Prompts for...

Explore Prompt Engineering Tools

All categories Trending Prompt Engineering directory Insights