Speech Recognition Datasets ML Frameworks

Multilingual audio corpora for training speech recognition, synthesis, and conversational AI models. Does NOT include general audio processing tools, music datasets, or non-speech audio collections.

There are 18 speech recognition datasets frameworks tracked. The highest-rated is hstsethi/in-mob-prefix at 38/100 with 5 stars.

Get all 18 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=speech-recognition-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 hstsethi/in-mob-prefix

Dataset, charts, models of 4 digit mobile number prefixes in India by state,...

38
Emerging
2 apple/ml-spatial-librispeech

A large synthetic dataset of spatial audio with multiple labels

36
Emerging
3 Nexdata-AI/359-Hours-Indonesian-Speech-Data-by-Mobile-Phone_Reading

Indonesian Speech Dataset

25
Experimental
4 Nexdata-AI/207-Hours-Japanese-Speaking-English-Speech-Data-by-Mobile-Phone

Japanese Speaking English Speech Dataset

23
Experimental
5 Nexdata-AI/98-Hours-Taiwan-Mandarin-Speech-Data-by-Mobile-Phone_Reading

Taiwan Speech Dataset

21
Experimental
6 Nexdata-AI/607-Hours-Cantonese-Conversational-Speech-Data-by-Mobile-Phone-and-Voice-Recorder

Cantonese Conversational Speech Dataset

11
Experimental
7 Nexdata-AI/490-People-Thai-Speech-Data-by-Mobile-Phone_Guiding

Thai Speech Dataset

11
Experimental
8 Nexdata-AI/Conversational_Speech_Dataset

Mega Conversational Speech Datasets for Speech Recognition

11
Experimental
9 Nexdata-AI/292-Hours-Thai-Speech-Data-by-Mobile-Phone_Reading

Thai Speech Dataset

10
Experimental
10 Nexdata-AI/240-Hours-Hindi-Speech-Data-by-Mobile-Phone_Reading

Hindi Speech Dataset

10
Experimental
11 Nexdata-AI/261-Hours-Japanese-Speech-Data-by-Mobile-Phone

Japanese Speech Dataset

10
Experimental
12 Nexdata-AI/1000-Hours-American-English-Conversational-Speech-Data-by-Mobile-Phone

American English Conversational Speech Dataset

10
Experimental
13 Nexdata-AI/10.4-Hours-Japanese-Synthesis-Corpus-Female

Japanese Synthesis Corpus

10
Experimental
14 Nexdata-AI/474-Hours-Japanese-Speech-Data-By-Mobile-Phone

Japanese Speech Dataset

10
Experimental
15 Nexdata-AI/1297-Hours-Scene-Noise-Data-by-Voice-Recorder

Scene Noise Dataset

10
Experimental
16 Nexdata-AI/800-Hours-American-English-Speech-Data-by-Mobile-Phone

American English Speech Dataset

10
Experimental
17 Nexdata-AI/1796-Hours-German-Speech-Data-by-Mobile-Phone

German Speech Dataset

10
Experimental
18 Nexdata-AI/759-Hours-Hindi-Speech-Data-by-Mobile-Phone

Hindi Speech Dataset

10
Experimental