YunzeMan/Lexicon3D
[NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
This tool helps researchers in computer vision analyze how well different AI models understand complex 3D environments. It takes inputs like posed images, videos, and 3D point clouds of indoor scenes. The output provides a structured 3D understanding, allowing evaluation on tasks like 3D object detection or question answering. It's designed for computer vision researchers and AI model developers working on advanced 3D scene perception.
100 stars. No commits in the last 6 months.
Use this if you are a computer vision researcher evaluating or developing AI models for detailed 3D scene understanding from diverse visual inputs.
Not ideal if you need a plug-and-play solution for immediate real-world applications or are not familiar with deep learning frameworks and research datasets.
Stars
100
Forks
5
Language
Python
License
MIT
Category
Last pushed
Feb 02, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/YunzeMan/Lexicon3D"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice