rabiulcste/vqazero
visual question answering prompting recipes for large vision-language models
This project helps researchers and developers explore how to make vision-language models better at answering questions about images without extensive fine-tuning. By feeding an image and a question into various models, it generates improved text answers, enabling more accurate visual question answering. It's designed for AI researchers and practitioners working with advanced visual AI.
No commits in the last 6 months.
Use this if you are an AI researcher or developer experimenting with advanced vision-language models and want to evaluate different prompting strategies for zero- or few-shot visual question answering tasks.
Not ideal if you need a simple, out-of-the-box solution for basic image captioning or if you are not comfortable working with command-line interfaces for model inference.
Stars
28
Forks
4
Language
Python
License
—
Category
Last pushed
Sep 14, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/rabiulcste/vqazero"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ShiZhengyan/PowerfulPromptFT
[NeurIPS 2023 Main Track] This is the repository for the paper titled "Don’t Stop Pretraining?...
OpenDriveLab/DriveLM
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for...
deepankar27/Prompt_Organizer
Managed Prompt Engineering
mala-lab/NegPrompt
The official implementation of CVPR 24' Paper "Learning Transferable Negative Prompts for...