EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

/ 100

Emerging

This tool helps researchers and engineers quickly build large datasets of audio and transcribed speech for training automatic speech recognition (ASR) systems. It takes a list of YouTube video URLs as input and extracts the audio, transcribes it using existing ASR models, and outputs a collection of audio clips paired with their corresponding text. This is ideal for those working on improving speech recognition models for various accents, languages, or specialized domains.

157 stars. No commits in the last 6 months.

Use this if you need to create a custom, large-scale dataset of spoken audio and its text transcription from YouTube videos to train or fine-tune speech recognition models.

Not ideal if you need perfectly human-curated and validated transcripts without any errors, or if your source audio is not available on YouTube.

speech-recognition ASR-dataset-creation audio-transcription machine-learning-engineering AI-research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

157

Forks

Language

Python

License

MIT

Higher-rated alternatives

ynop/audiomate

Python library for handling audio datasets.

reazon-research/ReazonSpeech

Massive open Japanese speech corpus

common-voice/cv-dataset

Metadata and versioning details for the Common Voice dataset

davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Explore Voice AI Tools

All categories Trending Voice AI directory Insights