vtu81/NaiveVQA
A Visual Question Answering model implemented in MindSpore and PyTorch. The model is a reimplementation of the paper *Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering*. It's our final project for course DL4NLP at ZJU.
This project helps you build a system that can answer questions about images, similar to how a human would. You provide it with a collection of images and a list of questions related to those images, and it outputs the predicted answers. This is useful for researchers and machine learning engineers working on artificial intelligence that can interpret visual information and language.
No commits in the last 6 months.
Use this if you are a machine learning researcher or engineer looking for a baseline model to understand or experiment with Visual Question Answering (VQA).
Not ideal if you need a production-ready VQA system for immediate deployment or if you don't have access to substantial computational resources like an Nvidia GPU.
Stars
10
Forks
4
Language
Jupyter Notebook
License
—
Category
Last pushed
Jul 27, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/vtu81/NaiveVQA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
KaiyangZhou/pytorch-vsumm-reinforce
Unsupervised video summarization with deep reinforcement learning (AAAI'18)