bowen-upenn/Multi-Agent-VQA

[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering

/ 100

Experimental

This project helps researchers and developers explore how large AI models can answer questions about images without needing to be specially trained first. You input an image and a question, and it provides an answer by coordinating different AI "agents" that specialize in tasks like object detection or counting. This is for AI researchers and practitioners working with zero-shot visual question answering.

No commits in the last 6 months.

Use this if you are a researcher or AI developer exploring advanced, zero-shot visual question answering capabilities using multi-agent foundation models.

Not ideal if you need a production-ready solution that supports a wide variety of large vision-language models or requires extensive fine-tuning on custom datasets.

visual-question-answering zero-shot-learning multi-agent-systems computer-vision-research foundation-models

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

InfinitiBit/graphbit

GraphBit is the world’s first enterprise-grade Agentic AI framework, built on a Rust core with a...

autogluon/autogluon-assistant

Multi-Agent System Powered by LLMs for End-to-end Multimodal ML Automation

pguso/agents-from-scratch

Build AI agents from first principles using a local LLM - no frameworks, no cloud APIs, no...

samholt/L2MAC

🚀 The LLM Automatic Computer Framework: L2MAC

pguso/ai-agents-from-scratch

Demystify AI agents by building them yourself. Local LLMs, no black boxes, real understanding of...

Explore AI Agents

All categories Trending AI Agent directory Insights