kardSIM/audio2img

Extend the Conditioning of Stable Diffusion to take Audio Embeddings Instead of Text Embeddings using Wav2Vec2-BERT model

/ 100

Experimental

This project helps artists, designers, and creative professionals generate unique images directly from audio inputs. Instead of typing text prompts, you feed in sounds (like a firework show, piano music, or thunder) and it creates corresponding visual art. It's for anyone who wants to explore a new dimension in generative AI, transforming auditory experiences into visual ones without relying on textual descriptions.

No commits in the last 6 months.

Use this if you want to create images from sound directly, bypassing text prompts entirely, and exploring the visual potential of audio data.

Not ideal if your primary need is precise, text-controlled image generation or if you require fine-grained control over specific visual elements using words.

generative-art sound-to-image digital-art creative-workflows audio-visualization

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

scraed/LanPaint

High quality training free inpaint for every stable diffusion model. Supports ComfyUI

julienkay/com.doji.diffusers

A Unity package to run pretrained diffusion models with Unity Sentis

apapiu/transformer_latent_diffusion

Text to Image Latent Diffusion using a Transformer core

Aatricks/LightDiffusion-Next

Fastest Diffusion backend, WebUI, server. Pushing implementation and discovery of optimizations...

FMXExpress/Stable-Diffusion-Desktop-Client

Stable Diffusion Desktop client for Windows, macOS, and Linux built in Embarcadero Delphi.

Explore Diffusion Models

All categories Trending Diffusion directory Insights