OpenMOSS/MOSS-Audio-Tokenizer
MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, it supports streaming and variable bitrates, delivering SOTA reconstruction and strong performance in generation and understanding—serving as a unified interface for next-generation native audio language models.
This project helps machine learning engineers working on audio understand, process, and generate sound more efficiently. It takes any raw audio (speech, music, sound effects) and converts it into a highly compressed, semantically rich digital code. This output code can then be used to reconstruct high-fidelity audio or power advanced audio understanding and generation tasks, acting as a universal interface for audio-based AI models.
162 stars.
Use this if you are developing advanced audio AI models and need a powerful, unified way to encode and decode diverse audio types with extreme compression and high fidelity.
Not ideal if you are looking for a simple, off-the-shelf application for basic audio recording or playback without complex AI model integration.
Stars
162
Forks
11
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 06, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/OpenMOSS/MOSS-Audio-Tokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Spr-Aachen/Easy-Voice-Toolkit
A user-friendly audio toolkit for voice recognition, voice transcription, voice conversion etc.
PrzemyslawSwiderski/python-gradle-plugin
Gradle plugin to run Python projects.
alphacep/awesome-russian-speech
Russian speech technology links
ftyers/commonvoice-utils
Linguistic processing for Common Voice
microsoft/UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech