vpulab/ovam

Code for the paper Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models @ CVPR 2024

/ 100

Emerging

This project helps graphic designers and artists understand how a text-to-image AI model, like Stable Diffusion, interprets descriptive text to create images. You provide a text prompt to generate an image, and then this tool shows you which parts of the image correspond to specific words in your prompt. This allows you to see the AI's 'thinking' behind its visual output.

No commits in the last 6 months.

Use this if you want to visualize how different words in your text prompt contribute to specific visual elements in the AI-generated image, or to refine those visual attributions.

Not ideal if you're looking for a tool to generate images directly without needing to analyze the internal workings of the diffusion model.

AI-art-generation image-interpretation prompt-engineering computer-graphics visual-AI-analysis

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

hao-ai-lab/FastVideo

A unified inference and post-training framework for accelerated video generation.

ModelTC/LightX2V

Light Image Video Generation Inference Framework

thu-ml/TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

PKU-YuanGroup/Helios

Helios: Real Real-Time Long Video Generation Model

PKU-YuanGroup/MagicTime

[TPAMI 2025🔥] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Explore Diffusion Models

All categories Trending Diffusion directory Insights