daveredrum/D3Net
[ECCV2022] D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
This project helps create detailed, human-like descriptions for objects in 3D scanned environments, and accurately locate objects based on a given textual description. It takes 3D scan data (like from ScanNet) and outputs either specific object descriptions or the precise 3D location of an object mentioned in text. This would be used by researchers and engineers working with 3D scene understanding and virtual reality applications.
No commits in the last 6 months.
Use this if you need to automatically generate precise descriptions for objects within complex 3D scans or accurately identify objects in a 3D environment based on natural language commands.
Not ideal if you are working with 2D images or videos, or if your primary need is for general scene classification rather than detailed object-level understanding in 3D.
Stars
44
Forks
5
Language
Python
License
—
Category
Last pushed
Aug 27, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/daveredrum/D3Net"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
3DOM-FBK/deep-image-matching
Multiview matching with deep-learning and hand-crafted local features for COLMAP and other SfM...
suhangpro/mvcnn
Multi-view CNN (MVCNN) for shape recognition
zouchuhang/LayoutNet
Torch implementation of our CVPR 18 paper: "LayoutNet: Reconstructing the 3D Room Layout from a...
andyzeng/tsdf-fusion-python
Python code to fuse multiple RGB-D images into a TSDF voxel volume.
andyzeng/tsdf-fusion
Fuse multiple depth frames into a TSDF voxel volume.