daveredrum/D3Net

[ECCV2022] D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding

/ 100

Experimental

This project helps create detailed, human-like descriptions for objects in 3D scanned environments, and accurately locate objects based on a given textual description. It takes 3D scan data (like from ScanNet) and outputs either specific object descriptions or the precise 3D location of an object mentioned in text. This would be used by researchers and engineers working with 3D scene understanding and virtual reality applications.

No commits in the last 6 months.

Use this if you need to automatically generate precise descriptions for objects within complex 3D scans or accurately identify objects in a 3D environment based on natural language commands.

Not ideal if you are working with 2D images or videos, or if your primary need is for general scene classification rather than detailed object-level understanding in 3D.

3D-scene-understanding spatial-computing virtual-reality-development computer-vision natural-language-processing

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

3DOM-FBK/deep-image-matching

Multiview matching with deep-learning and hand-crafted local features for COLMAP and other SfM...

suhangpro/mvcnn

Multi-view CNN (MVCNN) for shape recognition

zouchuhang/LayoutNet

Torch implementation of our CVPR 18 paper: "LayoutNet: Reconstructing the 3D Room Layout from a...

andyzeng/tsdf-fusion-python

Python code to fuse multiple RGB-D images into a TSDF voxel volume.

andyzeng/tsdf-fusion

Fuse multiple depth frames into a TSDF voxel volume.

Explore Computer Vision Tools

All categories Trending Computer Vision directory Insights