abhshkdz/neural-vqa-attention

:question: Attention-based Visual Question Answering in Torch

/ 100

Emerging

This project helps computer vision researchers and AI developers build systems that can understand images and answer questions about them. It takes an image and a natural language question as input, then produces a text answer along with a 'heatmap' showing where in the image the model focused its attention to derive that answer. It's designed for those exploring visual question answering models and their interpretability.

101 stars. No commits in the last 6 months.

Use this if you need a straightforward, interpretable model to understand how a system 'looks' at an image to answer a question, rather than needing the absolute highest accuracy.

Not ideal if you require state-of-the-art accuracy for highly sensitive or critical visual question answering applications.

visual question answering image understanding explainable AI attention mechanisms computer vision research

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 21 / 25

How are scores calculated?

Stars

101

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

adambielski/siamese-triplet

Siamese and triplet networks with online pair/triplet mining in PyTorch

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video analysis

KaiyangZhou/pytorch-vsumm-reinforce

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

Explore ML Frameworks

All categories Trending ML Framework directory Insights