yousefkotp/Visual-Question-Answering
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
This project offers a system that answers questions about images, which is especially useful for people who are visually impaired and rely on spoken queries. You input an image and a spoken question, and the system provides a precise, non-generative answer chosen from a predefined vocabulary. This tool is designed for end-users like researchers studying accessibility, or anyone needing quick, factual responses about image content.
No commits in the last 6 months.
Use this if you need a lightweight system to answer specific questions about images based on a fixed set of possible answers, prioritizing speed and computational efficiency.
Not ideal if you require the system to generate free-form, creative, or open-ended answers beyond a predefined vocabulary, or if you need to understand complex, nuanced visual contexts.
Stars
14
Forks
7
Language
Jupyter Notebook
License
—
Category
Last pushed
Jun 27, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/yousefkotp/Visual-Question-Answering"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
KaiyangZhou/pytorch-vsumm-reinforce
Unsupervised video summarization with deep reinforcement learning (AAAI'18)
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch