InternRobotics/PointLLM

[ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Clouds

/ 100

Emerging

This project offers a specialized Large Language Model (LLM) designed to interpret 3D object data. You input raw 3D point clouds, and the model understands the object's type, shape, and appearance, producing descriptive text. This is for researchers and engineers working with 3D sensor data who need to automatically describe or classify objects.

983 stars. No commits in the last 6 months.

Use this if you need an AI model that can understand and generate descriptions directly from 3D point cloud data, without being confused by depth ambiguities or occlusions.

Not ideal if your primary data source is 2D images or videos, or if you don't work with raw 3D point cloud representations.

3D-object-recognition point-cloud-analysis robotics-perception computer-vision autonomous-systems

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 15 / 25

How are scores calculated?

Stars

983

Forks

Language

Python

License

—

Higher-rated alternatives

TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

zjunlp/EasyInstruct

[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.

rese1f/MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

NVlabs/Eagle

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Explore Transformer Models

All categories Trending Transformer directory Insights