iral-lab/gold
Multimodal grounded language dataset
This dataset provides a collection of images, depth data, text descriptions, and speech recordings for common objects. It includes 207 instances across 47 object classes like food, home, medical, office, and tools, captured from multiple angles. Researchers and developers working on domestic robots or intelligent systems can use this data to train and test how well their systems understand and describe objects using both visual and spoken information.
No commits in the last 6 months.
Use this if you are developing or evaluating AI models that need to connect spoken language with visual object information, especially for applications like robot interaction or object recognition.
Not ideal if you need data for a domain outside of common household/office objects, or if your application doesn't require multimodal data linking speech, text, and visual inputs.
Stars
11
Forks
—
Language
—
License
—
Category
Last pushed
Dec 14, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/iral-lab/gold"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
microsoft/XPretrain
Multi-modality pre-training
TheShadow29/zsgnet-pytorch
Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural...
TheShadow29/VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
zeyofu/BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can...