DemisEom/SpecAugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
When training speech recognition models, the SpecAugment tool modifies audio spectrograms to create a wider variety of training examples. It takes an existing spectrogram of an audio file and alters it by warping the time axis, masking frequency blocks, and masking time segments. This helps speech AI developers make their models more robust to variations in speech.
656 stars. No commits in the last 6 months.
Use this if you are developing machine learning models for speech recognition and need to augment your audio training data to improve model performance and generalization.
Not ideal if you are looking for a general-purpose audio editing tool or a way to analyze raw audio files directly without processing them into spectrograms.
Stars
656
Forks
135
Language
Python
License
Apache-2.0
Category
Last pushed
Apr 05, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/DemisEom/SpecAugment"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
bshall/Tacotron
A PyTorch implementation of Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Kyubyong/dc_tts
A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
Rayhane-mamah/Tacotron-2
DeepMind's Tacotron-2 Tensorflow implementation
Kyubyong/tacotron
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
vlomme/Multi-Tacotron-Voice-Cloning
Phoneme multilingual(Russian-English) voice cloning based on