Cadene/vqa.pytorch

Visual Question Answering in Pytorch

/ 100

Emerging

This project offers a solution for 'Visual Question Answering' (VQA), a task where a computer answers questions about an image. You provide an image and a question related to its content, and the system delivers a short, factual answer. This is primarily for researchers and developers working on advanced AI for image comprehension and human-to-machine interaction.

735 stars. No commits in the last 6 months.

Use this if you are a researcher or AI developer working on multimodal AI systems and need to train or evaluate models for understanding visual content and answering natural language questions about it.

Not ideal if you are looking for a ready-to-use application or API for general image search or descriptive captioning, as this focuses on the specific VQA research task.

visual question answering multimodal AI computer vision research natural language processing human-computer interaction

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 25 / 25

How are scores calculated?

Stars

735

Forks

179

Language

Python

License

—

Higher-rated alternatives

open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

adambielski/siamese-triplet

Siamese and triplet networks with online pair/triplet mining in PyTorch

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video analysis

KaiyangZhou/pytorch-vsumm-reinforce

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

Explore ML Frameworks

All categories Trending ML Framework directory Insights