yc9701/pansori-tedxkr-corpus

Korean ASR Corpus generated from TEDx talks

35
/ 100
Emerging

This is a collection of Korean speech audio clips and their corresponding text transcripts, sourced from TEDx talks given in Korea between 2010 and 2014. It provides high-quality Korean speech data, about 3 hours in total from 41 speakers, as FLAC audio files and text pairs. Language researchers, AI developers, and speech technology engineers would use this to train or evaluate Korean speech recognition systems.

No commits in the last 6 months.

Use this if you need a pre-compiled, high-quality dataset of spoken Korean and its text for developing or testing speech recognition models.

Not ideal if you need a very large-scale corpus (this is about 3 hours) or require speech data from different domains or time periods beyond TEDx talks from 2010-2014.

Korean speech recognition ASR data linguistic research voice technology development machine learning datasets
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

27

Forks

4

Language

License

Last pushed

Jan 11, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/yc9701/pansori-tedxkr-corpus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.