Cadene/vqa.pytorch
Visual Question Answering in Pytorch
This project offers a solution for 'Visual Question Answering' (VQA), a task where a computer answers questions about an image. You provide an image and a question related to its content, and the system delivers a short, factual answer. This is primarily for researchers and developers working on advanced AI for image comprehension and human-to-machine interaction.
735 stars. No commits in the last 6 months.
Use this if you are a researcher or AI developer working on multimodal AI systems and need to train or evaluate models for understanding visual content and answering natural language questions about it.
Not ideal if you are looking for a ready-to-use application or API for general image search or descriptive captioning, as this focuses on the specific VQA research task.
Stars
735
Forks
179
Language
Python
License
—
Category
Last pushed
Dec 11, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Cadene/vqa.pytorch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
KaiyangZhou/pytorch-vsumm-reinforce
Unsupervised video summarization with deep reinforcement learning (AAAI'18)