FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
This project helps create high-quality, natural-sounding voiceovers from written text across many languages and dialects. You provide text, and it generates realistic spoken audio, even allowing for customization of emotion, speed, and volume. This is ideal for content creators, educators, or businesses needing automated voice production for various applications.
19,991 stars. Actively maintained with 6 commits in the last 30 days.
Use this if you need to transform written content into spoken audio with high naturalness and speaker consistency across multiple languages and Chinese dialects, including zero-shot voice cloning.
Not ideal if you require only basic text-to-speech for a single language without advanced customization or high-fidelity output.
Stars
19,991
Forks
2,270
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 11, 2026
Commits (30d)
6
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/FunAudioLLM/CosyVoice"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
travisvn/chatterbox-tts-api
Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate...
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
sfortis/openai_tts
Custom TTS component for Home Assistant. Utilizes the OpenAI speech engine or any compatible...
OpenMOSS/MOSS-TTSD
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis....
OpenMOSS/MOSS-TTS
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the...