H-EmbodVis/MERGE

[NeurIPS 2025] More Than Generation: Unifying Generation and Depth Estimation via Text-to-Image Diffusion Models

42
/ 100
Emerging

This project helps computer vision practitioners and 3D artists generate realistic images and accurately estimate depth from those images. You input text prompts or existing images, and it outputs a new image along with a detailed depth map, showing how far away different objects are. This is useful for anyone creating 3D scenes, visual effects, or analyzing spatial relationships in images.

215 stars.

Use this if you need to generate high-quality images from text descriptions and simultaneously obtain precise depth information for 3D modeling or scene understanding.

Not ideal if your primary goal is basic image generation without any need for depth estimation, or if you require an extremely lightweight solution for real-time applications.

3D-modeling computer-vision image-synthesis visual-effects scene-understanding
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 13 / 25
Community 13 / 25

How are scores calculated?

Stars

215

Forks

18

Language

Python

License

Apache-2.0

Last pushed

Oct 31, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/H-EmbodVis/MERGE"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.