YunzeMan/Lexicon3D

[NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

/ 100

Emerging

This tool helps researchers in computer vision analyze how well different AI models understand complex 3D environments. It takes inputs like posed images, videos, and 3D point clouds of indoor scenes. The output provides a structured 3D understanding, allowing evaluation on tasks like 3D object detection or question answering. It's designed for computer vision researchers and AI model developers working on advanced 3D scene perception.

100 stars. No commits in the last 6 months.

Use this if you are a computer vision researcher evaluating or developing AI models for detailed 3D scene understanding from diverse visual inputs.

Not ideal if you need a plug-and-play solution for immediate real-world applications or are not familiar with deep learning frameworks and research datasets.

3D-scene-understanding computer-vision-research AI-model-evaluation robotics-perception spatial-computing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

100

Forks

Language

Python

License

MIT

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights