SiyuanHuang95/ManipVQA

[IROS24 Oral]ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

22
/ 100
Experimental

This project helps roboticists and AI researchers improve how robots understand and interact with objects in the real world. By training Multimodal Large Language Models (MLLMs) with specific visual data about how objects can be used and their physical properties, robots can better interpret natural language commands for manipulation tasks. It takes standard image-text data and outputs an MLLM enhanced with robotic manipulation intelligence.

102 stars. No commits in the last 6 months.

Use this if you are developing robotic systems and need your robots to better understand how to interact with objects based on visual cues and natural language instructions.

Not ideal if your primary focus is on general image understanding or natural language processing without a direct application to robotic manipulation.

robotics robot-manipulation AI-training affordance-learning robot-vision
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 5 / 25

How are scores calculated?

Stars

102

Forks

3

Language

Python

License

Last pushed

Aug 22, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/SiyuanHuang95/ManipVQA"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.