iral-lab/gold

Multimodal grounded language dataset

/ 100

Experimental

This dataset provides a collection of images, depth data, text descriptions, and speech recordings for common objects. It includes 207 instances across 47 object classes like food, home, medical, office, and tools, captured from multiple angles. Researchers and developers working on domestic robots or intelligent systems can use this data to train and test how well their systems understand and describe objects using both visual and spoken information.

No commits in the last 6 months.

Use this if you are developing or evaluating AI models that need to connect spoken language with visual object information, especially for applications like robot interaction or object recognition.

Not ideal if you need data for a domain outside of common household/office objects, or if your application doesn't require multimodal data linking speech, text, and visual inputs.

robotics computer-vision speech-recognition natural-language-processing multimodal-AI

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

—

License

—

Higher-rated alternatives

TheShadow29/awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

microsoft/XPretrain

Multi-modality pre-training

TheShadow29/zsgnet-pytorch

Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural...

TheShadow29/VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

zeyofu/BLINK_Benchmark

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can...

Explore NLP Tools

All categories Trending NLP directory Insights