OpenMOSS/MOSS-Audio-Tokenizer

MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, it supports streaming and variable bitrates, delivering SOTA reconstruction and strong performance in generation and understanding—serving as a unified interface for next-generation native audio language models.

/ 100

Emerging

This project helps machine learning engineers working on audio understand, process, and generate sound more efficiently. It takes any raw audio (speech, music, sound effects) and converts it into a highly compressed, semantically rich digital code. This output code can then be used to reconstruct high-fidelity audio or power advanced audio understanding and generation tasks, acting as a universal interface for audio-based AI models.

162 stars.

Use this if you are developing advanced audio AI models and need a powerful, unified way to encode and decode diverse audio types with extreme compression and high fidelity.

Not ideal if you are looking for a simple, off-the-shelf application for basic audio recording or playback without complex AI model integration.

audio-processing speech-recognition-synthesis sound-generation machine-learning-engineering audio-ai-development

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 11 / 25

Community 10 / 25

How are scores calculated?

Stars

162

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

Spr-Aachen/Easy-Voice-Toolkit

A user-friendly audio toolkit for voice recognition, voice transcription, voice conversion etc.

PrzemyslawSwiderski/python-gradle-plugin

Gradle plugin to run Python projects.

alphacep/awesome-russian-speech

Russian speech technology links

ftyers/commonvoice-utils

Linguistic processing for Common Voice

microsoft/UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

Explore Voice AI Tools

All categories Trending Voice AI directory Insights