gicheonkang/sglkt-visdial
🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"
This project offers a method to enhance the ability of AI models to engage in visual dialogue, enabling them to have more natural and comprehensive conversations about images. It takes image features and dialogue history as input and outputs improved model performance in understanding and generating responses. This is for researchers and practitioners working on advanced AI conversational agents or visual question answering systems.
No commits in the last 6 months.
Use this if you are developing AI models that need to understand and discuss the content of images in a multi-turn conversational format.
Not ideal if you are looking for a ready-to-use application or an end-user product for general visual conversation, as this is a research implementation.
Stars
13
Forks
4
Language
Python
License
MIT
Category
Last pushed
Feb 01, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/gicheonkang/sglkt-visdial"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
microsoft/XPretrain
Multi-modality pre-training
TheShadow29/zsgnet-pytorch
Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural...
TheShadow29/VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
zeyofu/BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can...