Multilingual Speech Datasets Voice AI Tools
Curated speech corpora and audio datasets across multiple languages for training ASR and speech processing models. Does NOT include text-to-speech synthesis, voice cloning, or speech recognition inference tools.
There are 17 multilingual speech datasets tools tracked. The highest-rated is qianchang/zici at 43/100 with 31 stars.
Get all 17 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=voice-ai&subcategory=multilingual-speech-datasets&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
qianchang/zici
字词:收集国学/汉语字词拼音相关资源 |
|
Emerging |
| 2 |
gheyret/UQSpeechDataset
Uyghur Single Speaker Speech Dataset. ウイグル語音声データセット |
|
Emerging |
| 3 |
speechio/BigCiDian
Pronunciation lexicon covering both English and Chinese languages for... |
|
Emerging |
| 4 |
apluka34/Bud500
Bud500: A Comprehensive Vietnamese ASR Dataset |
|
Emerging |
| 5 |
harisbinzia/PronouncUR
PronouncUR: An Urdu Pronunciation Lexicon Generator |
|
Emerging |
| 6 |
jonsafari/buckeye_dict
Buckeye Pronunciation Dictionary |
|
Emerging |
| 7 |
gheyret/thuyg20_scripts
Script files of THUYG-20(A free Uyghur speech database Released by... |
|
Experimental |
| 8 |
skit-ai/phone-number-entity-dataset
Dataset Release for Phone Number Entity capture task |
|
Experimental |
| 9 |
Nexdata-AI/100-Hours-Thai-Children-Spontaneous-Speech-Data
Thai Child's Spontaneous Speech Data |
|
Experimental |
| 10 |
Dragon745/urdu-roman-dictionary
A growing open-source Urdu → Roman Urdu dictionary and lexicon for... |
|
Experimental |
| 11 |
Nexdata-AI/650-Hours-Uyghur-Spontaneous-Speech-Data
650-Hours-Uyghur-Spontaneous-Speech-Data |
|
Experimental |
| 12 |
Nexdata-AI/347-Hours-Italian-Speech-Data-Collected-by-Mobile-Phone
Italian Speech Dataset |
|
Experimental |
| 13 |
Nexdata-AI/310-Hours-Turkish-Scripted-Monologue-Smartphone-Speech-Dataset
310-Hours-Turkish-Scripted-Monologue-Smartphone-Speech-Dataset |
|
Experimental |
| 14 |
nakhunchumpolsathien/Thai-ASR-OutOfTheBox-Test-Set
Out-of-the-box test sets for validating Thai automatic speech recognition system |
|
Experimental |
| 15 |
xx205/switchboard_training_in_minutes
PyTorch with horovod setup for distributed training of Switchboard-1 Phase 1... |
|
Experimental |
| 16 |
Nexdata-AI/233-Hours-Finnish-Spontaneous-Speech-Data
Finnish Spontaneous Speech Data |
|
Experimental |
| 17 |
Nexdata-AI/225-Hours-Swedish-Spontaneous-Speech-Data
Swedish Spontaneous Speech Data |
|
Experimental |