Speech Recognition Datasets Voice AI Tools

There are 17 speech recognition datasets tools tracked. 1 score above 50 (established tier). The highest-rated is double22a/speech_dataset at 54/100 with 453 stars.

Get all 17 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=voice-ai&subcategory=speech-recognition-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 double22a/speech_dataset

The dataset of Speech Recognition

54
Established
2 Jakobovski/free-spoken-digit-dataset

A free audio dataset of spoken digits. An audio version of MNIST.

43
Emerging
3 Ijwi-ry-Ikirundi-AI/Kirundi_Dataset

🇧🇮 The first large-scale, open-source speech and text dataset for Kirundi...

40
Emerging
4 lottev1991/Project-AIdol-Public-English-Dataset

Public female English corpus used for Project AI❤dol

37
Emerging
5 Jahangirbd23/WenetSpeech-Yue

📑 Explore WenetSpeech-Yue, a comprehensive Cantonese speech corpus with rich...

22
Experimental
6 Nexdata-AI/338-Hours-Spanish-Speech-Data-by-Mobile-Phone

Spanish Speech Dataset

16
Experimental
7 Nexdata-AI/1000-Hours-Filipino-Speaking-English-Speech-Data-by-Mobile-Phone

Filipino English Speech Dataset

11
Experimental
8 Nexdata-AI/20-Hours-American-English-Speech-Synthesis-Corpus-Male

American English Speech Synthesis Corpus

11
Experimental
9 Nexdata-AI/50-Hours-American-Children-Speech-Data-by-Microphone

American Children Speech Dataset

11
Experimental
10 Nexdata-AI/548-Hours-Taiwanese-Accent-Mandarin-Spontaneous-Speech-Data

Taiwanese Accent Mandarin Spontaneous Speech Data

11
Experimental
11 Nexdata-AI/155-People-Malay-Speech-Data-by-Mobile-Phone_Guiding

Malay Speech Dataset

11
Experimental
12 Nexdata-AI/760-Hours-Hindi-Conversational-Speech-Data-by-Telephone

760 Hours - Hindi Conversational Speech Data by Telephone

11
Experimental
13 Nexdata-AI/357-Hours-Korean-Speech-Data-by-Mobile-Phone

Korean Speech Dataset

11
Experimental
14 Nexdata-AI/10-Hours-Far-filed-Noise-Speech-Data-in-Home-Environment-by-Mic-Array

Far-filed Noise Speech Dataset

11
Experimental
15 Nexdata-AI/201-People-Infant-Cry-Speech-Data-by-Mobile-Phone

Infant Cry Speech Dataset

11
Experimental
16 Nexdata-AI/500-Hours-Korean-Conversational-Speech-Data-by-Mobile-Phone

The dataset of Korean conversational speech

11
Experimental
17 Nexdata-AI/520-Hours-French-Speaking-English-Speech-Data-by-Mobile-Phone

French Speech Dataset

11
Experimental