bowen-upenn/Multi-Agent-VQA

[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering

27
/ 100
Experimental

This project helps researchers and developers explore how large AI models can answer questions about images without needing to be specially trained first. You input an image and a question, and it provides an answer by coordinating different AI "agents" that specialize in tasks like object detection or counting. This is for AI researchers and practitioners working with zero-shot visual question answering.

No commits in the last 6 months.

Use this if you are a researcher or AI developer exploring advanced, zero-shot visual question answering capabilities using multi-agent foundation models.

Not ideal if you need a production-ready solution that supports a wide variety of large vision-language models or requires extensive fine-tuning on custom datasets.

visual-question-answering zero-shot-learning multi-agent-systems computer-vision-research foundation-models
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 5 / 25

How are scores calculated?

Stars

20

Forks

1

Language

Python

License

MIT

Last pushed

Sep 21, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/bowen-upenn/Multi-Agent-VQA"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.