khuangaf/ITRI-speech-recognition-dataset-generation
Automatic Speech Recognition Dataset Generation
This tool helps researchers, linguists, or educators create new, custom speech recognition datasets for Mandarin, especially those including Taiwanese or English speech. It takes YouTube videos, extracts relevant audio and subtitles, and processes them into a dataset suitable for training speech recognition models. The primary users are those needing specialized Mandarin speech data that isn't readily available.
No commits in the last 6 months.
Use this if you need to build a specialized Mandarin speech recognition dataset from YouTube video content, particularly if it includes Taiwanese or English speech.
Not ideal if you need an English-only speech recognition dataset, or if you prefer to manually annotate data.
Stars
37
Forks
20
Language
Jupyter Notebook
License
—
Category
Last pushed
Aug 26, 2018
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/khuangaf/ITRI-speech-recognition-dataset-generation"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ynop/audiomate
Python library for handling audio datasets.
reazon-research/ReazonSpeech
Massive open Japanese speech corpus
common-voice/cv-dataset
Metadata and versioning details for the Common Voice dataset
davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...
EgorLakomkin/KTSpeechCrawler
Automatically constructing corpus for automatic speech recognition from YouTube videos