vtu81/NaiveVQA

A Visual Question Answering model implemented in MindSpore and PyTorch. The model is a reimplementation of the paper *Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering*. It's our final project for course DL4NLP at ZJU.

/ 100

Experimental

This project helps you build a system that can answer questions about images, similar to how a human would. You provide it with a collection of images and a list of questions related to those images, and it outputs the predicted answers. This is useful for researchers and machine learning engineers working on artificial intelligence that can interpret visual information and language.

No commits in the last 6 months.

Use this if you are a machine learning researcher or engineer looking for a baseline model to understand or experiment with Visual Question Answering (VQA).

Not ideal if you need a production-ready VQA system for immediate deployment or if you don't have access to substantial computational resources like an Nvidia GPU.

visual-question-answering computer-vision natural-language-processing deep-learning-research AI-model-development

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

adambielski/siamese-triplet

Siamese and triplet networks with online pair/triplet mining in PyTorch

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video analysis

KaiyangZhou/pytorch-vsumm-reinforce

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

Explore ML Frameworks

All categories Trending ML Framework directory Insights