agneet42/revision

[ECCV 2024] "REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models"

21
/ 100
Experimental

REVISION helps improve the spatial accuracy of AI models that generate images from text or understand images with text descriptions. It takes a text prompt describing objects and their spatial relationships and outputs realistic synthetic images that accurately depict these relationships. Computer vision researchers and developers building generative AI models would use this.

No commits in the last 6 months.

Use this if your text-to-image or multimodal AI models struggle to accurately represent spatial relationships like 'above,' 'next to,' or 'behind' when generating or analyzing images.

Not ideal if you are looking for a general-purpose image generation tool for creative content, as its primary focus is on improving spatial reasoning in AI models.

computer-vision-research generative-ai multimodal-ai synthetic-data-generation ai-model-training
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

13

Forks

Language

Python

License

Apache-2.0

Last pushed

Aug 06, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/agneet42/revision"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.