viig99/esolafast
Fast C++ implementation of ESOLA using KFRLib, can be used for online time-stretch augmentation during SpeechToText training.
This tool helps improve automatic speech recognition (ASR) systems by efficiently modifying audio data. It takes raw audio files as input and outputs time-stretched versions, making your ASR models more robust to variations in speech speed. It's designed for machine learning engineers and researchers who are training Speech-to-Text models.
No commits in the last 6 months.
Use this if you are an ML engineer training a Speech-to-Text model and need to quickly augment your audio datasets with time-stretched speech without sacrificing audio quality.
Not ideal if you are looking for a general-purpose audio editor or a tool for music production, as its primary focus is on speech augmentation for machine learning.
Stars
16
Forks
2
Language
C++
License
MIT
Category
Last pushed
Jul 25, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/viig99/esolafast"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TensorSpeech/TensorFlowASR
:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2....
dangvansam/viet-asr
VietASR - Vietnamese Automatic Speech Recognition
wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
xinjli/allosaurus
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
srvk/eesen
The official repository of the Eesen project