guyyariv/AudioToken
This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
This project helps content creators and researchers generate images directly from audio recordings. You provide an audio clip, and the system creates a corresponding image based on the sound. This is ideal for artists, marketers, or researchers exploring new ways to visualize soundscapes or create multimedia content without needing descriptive text.
No commits in the last 6 months.
Use this if you need to generate visual content from sound, such as creating album art from music, visualizing sound events for research, or producing unique imagery for marketing campaigns based on audio clips.
Not ideal if you need precise control over image details or require images that are not conceptually linked to audio.
Stars
88
Forks
6
Language
Python
License
MIT
Category
Last pushed
Jun 18, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/guyyariv/AudioToken"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
bghira/SimpleTuner
A general fine-tuning kit geared toward image/video/audio diffusion models.
mcmonkeyprojects/SwarmUI
SwarmUI (formerly StableSwarmUI), A Modular Stable Diffusion Web-User-Interface, with an...
nateraw/stable-diffusion-videos
Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
TheDesignFounder/DreamLayer
Benchmark diffusion models faster. Automate evals, seeds, and metrics for reproducible results.