DemisEom/SpecAugment

A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

/ 100

Established

When training speech recognition models, the SpecAugment tool modifies audio spectrograms to create a wider variety of training examples. It takes an existing spectrogram of an audio file and alters it by warping the time axis, masking frequency blocks, and masking time segments. This helps speech AI developers make their models more robust to variations in speech.

656 stars. No commits in the last 6 months.

Use this if you are developing machine learning models for speech recognition and need to augment your audio training data to improve model performance and generalization.

Not ideal if you are looking for a general-purpose audio editing tool or a way to analyze raw audio files directly without processing them into spectrograms.

speech-recognition audio-processing machine-learning-training data-augmentation AI-development

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

656

Forks

135

Language

Python

License

Apache-2.0

Related tools

bshall/Tacotron

A PyTorch implementation of Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

Kyubyong/dc_tts

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

Rayhane-mamah/Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation

Kyubyong/tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

vlomme/Multi-Tacotron-Voice-Cloning

Phoneme multilingual(Russian-English) voice cloning based on

Explore Voice AI Tools

All categories Trending Voice AI directory Insights