Kyubyong/specAugment

Tensor2tensor experiment with SpecAugment

/ 100

Emerging

This project helps speech recognition engineers improve their automatic speech recognition (ASR) models by augmenting speech data. It takes raw speech audio and applies techniques like frequency and time masking to the spectrograms. The output is a more robust ASR model, capable of better performance in real-world scenarios.

No commits in the last 6 months.

Use this if you are developing automatic speech recognition systems and want to improve your model's accuracy and generalization by making your training data more diverse.

Not ideal if you are not working with speech data or if your primary goal is not improving speech recognition model performance.

speech-recognition audio-processing machine-learning-engineering model-training

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

bshall/Tacotron

A PyTorch implementation of Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

Kyubyong/dc_tts

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

DemisEom/SpecAugment

A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

Rayhane-mamah/Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation

Kyubyong/tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Explore Voice AI Tools

All categories Trending Voice AI directory Insights