revsic/speechset
Numpy-librosa implementation of Speech dataset pipeline
This project helps speech researchers and machine learning practitioners prepare audio data for training speech recognition or synthesis models. It takes raw audio files and their corresponding text transcripts as input, then processes them into a structured dataset ready for model training. This is ideal for anyone working with spoken language data to build AI speech applications.
No commits in the last 6 months.
Use this if you need to standardize and process raw audio and text data into a consistent format for your speech-related machine learning projects.
Not ideal if you're looking for a tool to perform speech recognition or synthesis directly, as this focuses solely on dataset preparation.
Stars
9
Forks
6
Language
Python
License
MIT
Category
Last pushed
Jan 18, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/revsic/speechset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hetpandya/youtube_tts_data_generator
A python library to generate speech dataset from Youtube videos
IS2AI/Kazakh_TTS
An expanded version of the previously released Kazakh text-to-speech (KazakhTTS) synthesis...
taresh18/TTSizer
ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets β¨
Hecate2/sukasuka-vocal-dataset-builder
γγγγγ’γγ‘γγ«γγγΌγΏγ»γγγ1st anime vocal dataset. Extract audio (vocal) files from video based on .ass...
youmebangbang/TTS-dataset-tools
Automatically generates TTS dataset using audio and associated text. Make cuts under a custom...