jishengpeng/WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

/ 100

Emerging

This tool helps researchers and AI developers working with audio to convert raw speech, music, or other sounds into a highly compressed sequence of 'tokens.' These tokens are a simplified representation of the audio, making it much easier and faster to process within advanced AI systems like audio language models. It takes audio files as input and outputs these discrete audio tokens, or can reconstruct audio from previously generated tokens.

1,279 stars. No commits in the last 6 months.

Use this if you need to efficiently represent audio with very few data points per second (40-75 tokens/second) while maintaining high sound quality for applications like audio language modeling or generative AI for sound.

Not ideal if your primary goal is basic audio transcription or simple signal processing tasks that don't require advanced discrete representation for AI models.

audio-processing speech-recognition AI-audio-modeling generative-audio sound-engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

1,279

Forks

111

Language

Python

License

MIT

Higher-rated alternatives

kan-bayashi/ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

fatchord/WaveRNN

WaveRNN Vocoder + TTS

shangeth/wavencoder

WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation,...

rishikksh20/iSTFTNet-pytorch

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier...

seungwonpark/melgan

MelGAN vocoder (compatible with NVIDIA/tacotron2)

Explore Voice AI Tools

All categories Trending Voice AI directory Insights