khuangaf/ITRI-speech-recognition-dataset-generation

Automatic Speech Recognition Dataset Generation

/ 100

Emerging

This tool helps researchers, linguists, or educators create new, custom speech recognition datasets for Mandarin, especially those including Taiwanese or English speech. It takes YouTube videos, extracts relevant audio and subtitles, and processes them into a dataset suitable for training speech recognition models. The primary users are those needing specialized Mandarin speech data that isn't readily available.

No commits in the last 6 months.

Use this if you need to build a specialized Mandarin speech recognition dataset from YouTube video content, particularly if it includes Taiwanese or English speech.

Not ideal if you need an English-only speech recognition dataset, or if you prefer to manually annotate data.

speech-recognition mandarin-linguistics data-generation linguistic-research educational-technology

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 19 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

ynop/audiomate

Python library for handling audio datasets.

reazon-research/ReazonSpeech

Massive open Japanese speech corpus

common-voice/cv-dataset

Metadata and versioning details for the Common Voice dataset

davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...

EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

Explore Voice AI Tools

All categories Trending Voice AI directory Insights