daveredrum/D3Net

[ECCV2022] D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding

27
/ 100
Experimental

This project helps create detailed, human-like descriptions for objects in 3D scanned environments, and accurately locate objects based on a given textual description. It takes 3D scan data (like from ScanNet) and outputs either specific object descriptions or the precise 3D location of an object mentioned in text. This would be used by researchers and engineers working with 3D scene understanding and virtual reality applications.

No commits in the last 6 months.

Use this if you need to automatically generate precise descriptions for objects within complex 3D scans or accurately identify objects in a 3D environment based on natural language commands.

Not ideal if you are working with 2D images or videos, or if your primary need is for general scene classification rather than detailed object-level understanding in 3D.

3D-scene-understanding spatial-computing virtual-reality-development computer-vision natural-language-processing
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 11 / 25

How are scores calculated?

Stars

44

Forks

5

Language

Python

License

Last pushed

Aug 27, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/daveredrum/D3Net"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.