yousefkotp/Visual-Question-Answering

A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder

29
/ 100
Experimental

This project offers a system that answers questions about images, which is especially useful for people who are visually impaired and rely on spoken queries. You input an image and a spoken question, and the system provides a precise, non-generative answer chosen from a predefined vocabulary. This tool is designed for end-users like researchers studying accessibility, or anyone needing quick, factual responses about image content.

No commits in the last 6 months.

Use this if you need a lightweight system to answer specific questions about images based on a fixed set of possible answers, prioritizing speed and computational efficiency.

Not ideal if you require the system to generate free-form, creative, or open-ended answers beyond a predefined vocabulary, or if you need to understand complex, nuanced visual contexts.

visual assistance accessibility image understanding information retrieval computer vision
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 16 / 25

How are scores calculated?

Stars

14

Forks

7

Language

Jupyter Notebook

License

Last pushed

Jun 27, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/yousefkotp/Visual-Question-Answering"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.