agneet42/revision
[ECCV 2024] "REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models"
REVISION helps improve the spatial accuracy of AI models that generate images from text or understand images with text descriptions. It takes a text prompt describing objects and their spatial relationships and outputs realistic synthetic images that accurately depict these relationships. Computer vision researchers and developers building generative AI models would use this.
No commits in the last 6 months.
Use this if your text-to-image or multimodal AI models struggle to accurately represent spatial relationships like 'above,' 'next to,' or 'behind' when generating or analyzing images.
Not ideal if you are looking for a general-purpose image generation tool for creative content, as its primary focus is on improving spatial reasoning in AI models.
Stars
13
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Aug 06, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/agneet42/revision"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
UCSC-VLAA/story-iter
[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization
PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks,...
keivalya/mini-vla
a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to...
adobe-research/custom-diffusion
Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
byliutao/1Prompt1Story
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation...