zhao-kun/VibeVoiceFusion

VibeVoiceFusion is a full-stack, multi-speaker voice generation web system featuring LoRA fine-tuning, batch generation, and VRAM optimization. Based on Microsoft's VibeVoice (AR + diffusion architecture)

45
/ 100
Emerging

This web application helps content creators, educators, or marketers generate high-quality, natural-sounding synthetic speech from text. You input written scripts and reference voice samples, and it outputs custom audio files with distinct voices, supporting multiple speakers for dialogues or single narration. It's designed for anyone needing professional voiceovers without hiring voice actors.

453 stars.

Use this if you need to quickly create synthetic speech, clone voices, or generate multi-speaker dialogues for various content types, even with limited GPU resources.

Not ideal if you need to create voices from scratch without any reference audio or if your projects demand extremely short audio segments where latency is critical.

voiceover-production content-creation audiobook-narration e-learning marketing-materials
No License No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 7 / 25
Community 18 / 25

How are scores calculated?

Stars

453

Forks

56

Language

Python

License

Last pushed

Feb 23, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/zhao-kun/VibeVoiceFusion"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.