OpenMOSS/MOSS-TTSD
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references.
This project helps content creators transform dialogue scripts into dynamic, expressive spoken conversations with multiple distinct speakers. You provide a script and short audio references for each speaker, and it generates natural-sounding, long-form spoken dialogue up to 60 minutes. It's ideal for producers of podcasts, audiobooks, commentary, and dubbed content.
1,202 stars. Actively maintained with 3 commits in the last 30 days.
Use this if you need to create realistic, multi-speaker audio from text for long-form content like podcasts or audiobooks, with flexible control over speaker identities and conversational flow.
Not ideal if you only need single-speaker text-to-speech or extremely short audio clips, as this tool is optimized for continuous, multi-party dialogue.
Stars
1,202
Forks
116
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 06, 2026
Commits (30d)
3
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/OpenMOSS/MOSS-TTSD"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
travisvn/chatterbox-tts-api
Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate...
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment...
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
sfortis/openai_tts
Custom TTS component for Home Assistant. Utilizes the OpenAI speech engine or any compatible...
OpenMOSS/MOSS-TTS
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the...