yc9701/pansori
Tools for ASR Corpus Generation from Online Video
This tool helps researchers, linguists, or educators create high-quality datasets for training Automatic Speech Recognition (ASR) models. It takes online videos with existing audio and subtitle tracks, processes them to accurately align spoken words with their text, and then cleans the data. The output is a refined collection of audio clips paired with their corresponding text, ready for ASR model training.
140 stars. No commits in the last 6 months.
Use this if you need to build a specialized speech corpus from online video content, especially for languages where existing ASR training data is scarce.
Not ideal if you need to transcribe live audio, process audio without any accompanying subtitle data, or if you prefer not to use cloud-based ASR services for validation.
Stars
140
Forks
27
Language
Python
License
MIT
Category
Last pushed
Feb 10, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/yc9701/pansori"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ynop/audiomate
Python library for handling audio datasets.
reazon-research/ReazonSpeech
Massive open Japanese speech corpus
common-voice/cv-dataset
Metadata and versioning details for the Common Voice dataset
davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset...
EgorLakomkin/KTSpeechCrawler
Automatically constructing corpus for automatic speech recognition from YouTube videos