vpulab/ovam
Code for the paper Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models @ CVPR 2024
This project helps graphic designers and artists understand how a text-to-image AI model, like Stable Diffusion, interprets descriptive text to create images. You provide a text prompt to generate an image, and then this tool shows you which parts of the image correspond to specific words in your prompt. This allows you to see the AI's 'thinking' behind its visual output.
No commits in the last 6 months.
Use this if you want to visualize how different words in your text prompt contribute to specific visual elements in the AI-generated image, or to refine those visual attributions.
Not ideal if you're looking for a tool to generate images directly without needing to analyze the internal workings of the diffusion model.
Stars
71
Forks
6
Language
Python
License
MIT
Category
Last pushed
Jun 14, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/vpulab/ovam"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hao-ai-lab/FastVideo
A unified inference and post-training framework for accelerated video generation.
ModelTC/LightX2V
Light Image Video Generation Inference Framework
thu-ml/TurboDiffusion
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
PKU-YuanGroup/Helios
Helios: Real Real-Time Long Video Generation Model
PKU-YuanGroup/MagicTime
[TPAMI 2025🔥] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators