eslambakr/LAR-Look-Around-and-Refer
This is the official implementation for our paper;"LAR:Look Around and Refer".
This project helps computer vision researchers and developers improve how AI systems identify specific objects within complex 3D environments, based on natural language descriptions. It takes a 3D scene (like a room scan) and a text description as input, then outputs the precisely located 3D object that matches the description. This is useful for anyone building AI assistants, robotics, or spatial computing applications that need to understand and interact with objects in real-world 3D spaces.
No commits in the last 6 months.
Use this if you are working on 3D visual grounding tasks and want to enhance your model's accuracy by incorporating synthesized 2D image cues from 3D point clouds without needing actual 2D input images.
Not ideal if your primary focus is on 2D object recognition or if you require a simple, off-the-shelf solution without deep engagement in research-level computer vision.
Stars
30
Forks
2
Language
C++
License
MIT
Category
Last pushed
Dec 01, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/eslambakr/LAR-Look-Around-and-Refer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVlabs/MambaVision
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
sign-language-translator/sign-language-translator
Python library & framework to build custom translators for the hearing-impaired and translate...
kyegomez/Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
autonomousvision/transfuser
[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving;...
kyegomez/MultiModalMamba
A novel implementation of fusing ViT with Mamba into a fast, agile, and high performance...