zhao-kun/VibeVoiceFusion
VibeVoiceFusion is a full-stack, multi-speaker voice generation web system featuring LoRA fine-tuning, batch generation, and VRAM optimization. Based on Microsoft's VibeVoice (AR + diffusion architecture)
This web application helps content creators, educators, or marketers generate high-quality, natural-sounding synthetic speech from text. You input written scripts and reference voice samples, and it outputs custom audio files with distinct voices, supporting multiple speakers for dialogues or single narration. It's designed for anyone needing professional voiceovers without hiring voice actors.
453 stars.
Use this if you need to quickly create synthetic speech, clone voices, or generate multi-speaker dialogues for various content types, even with limited GPU resources.
Not ideal if you need to create voices from scratch without any reference audio or if your projects demand extremely short audio segments where latency is critical.
Stars
453
Forks
56
Language
Python
License
—
Category
Last pushed
Feb 23, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/zhao-kun/VibeVoiceFusion"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BoltzmannEntropy/MimikaStudio
MimikaStudio - A local-first application for macOS (Apple Silicon) + Agentic MCP Support
aahl/qwen-asr2api
🎤 Qwen 3 ASR to OpenAI API, 免费STT语音识别模型
gabriele-mastrapasqua/qwen3-tts
Pure C inference engine for Qwen3-TTS text-to-speech. No Python, no PyTorch — just C and BLAS....
shijincai/VibeVoice
Archive of the official Microsoft VibeVoice repository (7B & 1.5B). Backup of the deleted source...
talin190/Qwen3-TTS-Daggr-UI
🎤 Create dynamic voice experiences with Qwen3-TTS-Daggr-UI, a Gradio app for voice design,...