kardSIM/audio2img

Extend the Conditioning of Stable Diffusion to take Audio Embeddings Instead of Text Embeddings using Wav2Vec2-BERT model

21
/ 100
Experimental

This project helps artists, designers, and creative professionals generate unique images directly from audio inputs. Instead of typing text prompts, you feed in sounds (like a firework show, piano music, or thunder) and it creates corresponding visual art. It's for anyone who wants to explore a new dimension in generative AI, transforming auditory experiences into visual ones without relying on textual descriptions.

No commits in the last 6 months.

Use this if you want to create images from sound directly, bypassing text prompts entirely, and exploring the visual potential of audio data.

Not ideal if your primary need is precise, text-controlled image generation or if you require fine-grained control over specific visual elements using words.

generative-art sound-to-image digital-art creative-workflows audio-visualization
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

13

Forks

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Sep 25, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/kardSIM/audio2img"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.