qizekun/ShapeLLM
[ECCV 2024] ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
This project helps roboticists and augmented reality developers enable robots or AR systems to understand 3D objects in the real world through natural language. You provide the system with 3D scans (point clouds) of objects and text questions, and it outputs text answers describing or identifying those objects. It's designed for anyone building interactive systems that need to 'see' and 'talk about' their physical environment.
228 stars. No commits in the last 6 months.
Use this if you are developing an embodied AI system or an augmented reality application that needs to interpret 3D object data from sensors and respond to user queries in natural language.
Not ideal if your application primarily involves 2D image analysis, generating new 3D models, or requires highly precise 3D measurements rather than high-level object understanding.
Stars
228
Forks
17
Language
Python
License
Apache-2.0
Category
Last pushed
Oct 08, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/qizekun/ShapeLLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice