EgorLakomkin/KTSpeechCrawler
Automatically constructing corpus for automatic speech recognition from YouTube videos
This tool helps researchers and engineers quickly build large datasets of audio and transcribed speech for training automatic speech recognition (ASR) systems. It takes a list of YouTube video URLs as input and extracts the audio, transcribes it using existing ASR models, and outputs a collection of audio clips paired with their corresponding text. This is ideal for those working on improving speech recognition models for various accents, languages, or specialized domains.
157 stars. No commits in the last 6 months.
Use this if you need to create a custom, large-scale dataset of spoken audio and its text transcription from YouTube videos to train or fine-tune speech recognition models.
Not ideal if you need perfectly human-curated and validated transcripts without any errors, or if your source audio is not available on YouTube.
Stars
157
Forks
38
Language
Python
License
MIT
Category
Last pushed
Feb 15, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/EgorLakomkin/KTSpeechCrawler"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ynop/audiomate
Python library for handling audio datasets.
reazon-research/ReazonSpeech
Massive open Japanese speech corpus
common-voice/cv-dataset
Metadata and versioning details for the Common Voice dataset
davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...
coqui-ai/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies