nttmdlab-nlp/SlideVQA

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)

/ 100

Emerging

This project provides a comprehensive dataset for training and evaluating systems that can answer questions based on information presented across multiple slide images, like those found in a presentation deck. It takes a slide deck (collection of images) and a question, then identifies relevant slides and provides the answer. This is useful for researchers and developers working on intelligent document analysis and visual question answering systems.

105 stars. No commits in the last 6 months.

Use this if you are developing or evaluating AI models that need to extract information and answer questions from multi-page documents, specifically slide presentations.

Not ideal if you are looking for an out-of-the-box application to answer questions from your own slide decks, as this is a dataset for model development, not an end-user tool.

document-intelligence visual-question-answering slide-analysis multi-document-processing information-extraction

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

105

Forks

Language

Python

License

—

Higher-rated alternatives

RachanaJayaram/Cross-Attention-VizWiz-VQA

A self-evident application of the VQA task is to design systems that aid blind people with sight...

MingDanng/VQA_DeepLearning_Project

Hệ thống Hỏi đáp trực quan (VQA). Mô hình AI đa phương thức kết hợp Thị giác máy tính (CNN) và...

Explore Computer Vision Tools

All categories Trending Computer Vision directory Insights