yc9701/pansori

Tools for ASR Corpus Generation from Online Video

/ 100

Emerging

This tool helps researchers, linguists, or educators create high-quality datasets for training Automatic Speech Recognition (ASR) models. It takes online videos with existing audio and subtitle tracks, processes them to accurately align spoken words with their text, and then cleans the data. The output is a refined collection of audio clips paired with their corresponding text, ready for ASR model training.

140 stars. No commits in the last 6 months.

Use this if you need to build a specialized speech corpus from online video content, especially for languages where existing ASR training data is scarce.

Not ideal if you need to transcribe live audio, process audio without any accompanying subtitle data, or if you prefer not to use cloud-based ASR services for validation.

speech-research language-technology linguistics machine-learning-datasets educational-content

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

140

Forks

Language

Python

License

MIT

Higher-rated alternatives

ynop/audiomate

Python library for handling audio datasets.

reazon-research/ReazonSpeech

Massive open Japanese speech corpus

common-voice/cv-dataset

Metadata and versioning details for the Common Voice dataset

davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...

EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

Explore Voice AI Tools

All categories Trending Voice AI directory Insights