MOSS-TTSD and MOSS-Speech

MOSS-TTSD handles the output side (text-to-speech synthesis) while MOSS-Speech handles the input side (speech-to-speech processing), making them complementary components of an end-to-end voice conversation pipeline.

MOSS-TTSD
57
Established
MOSS-Speech
44
Emerging
Maintenance 13/25
Adoption 10/25
Maturity 15/25
Community 19/25
Maintenance 10/25
Adoption 10/25
Maturity 15/25
Community 9/25
Stars: 1,202
Forks: 116
Downloads:
Commits (30d): 3
Language: Python
License: Apache-2.0
Stars: 127
Forks: 7
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No Package No Dependents
No Package No Dependents

About MOSS-TTSD

OpenMOSS/MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enabling zero-shot voice cloning from short audio references.

This project helps content creators transform dialogue scripts into dynamic, expressive spoken conversations with multiple distinct speakers. You provide a script and short audio references for each speaker, and it generates natural-sounding, long-form spoken dialogue up to 60 minutes. It's ideal for producers of podcasts, audiobooks, commentary, and dubbed content.

audiobook production podcast creation media dubbing voice content conversational AI

About MOSS-Speech

OpenMOSS/MOSS-Speech

MOSS-Speech is a true speech-to-speech large language model without text guidance.

This project helps create direct, natural voice-to-voice interactions for spoken applications. You provide spoken input, and it responds directly with spoken output, without ever converting to text in between. It's designed for anyone building interactive voice assistants, dialogue systems, or real-time spoken translation tools.

voice-assistants spoken-dialogue-systems real-time-voice-interaction speech-technology conversational-AI

Scores updated daily from GitHub, PyPI, and npm data. How scores work