bowen-upenn/Multi-Agent-VQA
[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering
This project helps researchers and developers explore how large AI models can answer questions about images without needing to be specially trained first. You input an image and a question, and it provides an answer by coordinating different AI "agents" that specialize in tasks like object detection or counting. This is for AI researchers and practitioners working with zero-shot visual question answering.
No commits in the last 6 months.
Use this if you are a researcher or AI developer exploring advanced, zero-shot visual question answering capabilities using multi-agent foundation models.
Not ideal if you need a production-ready solution that supports a wide variety of large vision-language models or requires extensive fine-tuning on custom datasets.
Stars
20
Forks
1
Language
Python
License
MIT
Category
Last pushed
Sep 21, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/bowen-upenn/Multi-Agent-VQA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
InfinitiBit/graphbit
GraphBit is the world’s first enterprise-grade Agentic AI framework, built on a Rust core with a...
autogluon/autogluon-assistant
Multi-Agent System Powered by LLMs for End-to-end Multimodal ML Automation
pguso/agents-from-scratch
Build AI agents from first principles using a local LLM - no frameworks, no cloud APIs, no...
samholt/L2MAC
🚀 The LLM Automatic Computer Framework: L2MAC
pguso/ai-agents-from-scratch
Demystify AI agents by building them yourself. Local LLMs, no black boxes, real understanding of...