EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

47
/ 100
Emerging

This tool helps researchers and engineers quickly build large datasets of audio and transcribed speech for training automatic speech recognition (ASR) systems. It takes a list of YouTube video URLs as input and extracts the audio, transcribes it using existing ASR models, and outputs a collection of audio clips paired with their corresponding text. This is ideal for those working on improving speech recognition models for various accents, languages, or specialized domains.

157 stars. No commits in the last 6 months.

Use this if you need to create a custom, large-scale dataset of spoken audio and its text transcription from YouTube videos to train or fine-tune speech recognition models.

Not ideal if you need perfectly human-curated and validated transcripts without any errors, or if your source audio is not available on YouTube.

speech-recognition ASR-dataset-creation audio-transcription machine-learning-engineering AI-research
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

157

Forks

38

Language

Python

License

MIT

Last pushed

Feb 15, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/EgorLakomkin/KTSpeechCrawler"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.