OpenMOSS/MOSS-Audio-Tokenizer

MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, it supports streaming and variable bitrates, delivering SOTA reconstruction and strong performance in generation and understanding—serving as a unified interface for next-generation native audio language models.

41
/ 100
Emerging

This project helps machine learning engineers working on audio understand, process, and generate sound more efficiently. It takes any raw audio (speech, music, sound effects) and converts it into a highly compressed, semantically rich digital code. This output code can then be used to reconstruct high-fidelity audio or power advanced audio understanding and generation tasks, acting as a universal interface for audio-based AI models.

162 stars.

Use this if you are developing advanced audio AI models and need a powerful, unified way to encode and decode diverse audio types with extreme compression and high fidelity.

Not ideal if you are looking for a simple, off-the-shelf application for basic audio recording or playback without complex AI model integration.

audio-processing speech-recognition-synthesis sound-generation machine-learning-engineering audio-ai-development
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 11 / 25
Community 10 / 25

How are scores calculated?

Stars

162

Forks

11

Language

Python

License

Apache-2.0

Last pushed

Mar 06, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/OpenMOSS/MOSS-Audio-Tokenizer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.