TTS Dataset Creation Voice AI Tools

Tools and workflows for preparing, recording, processing, and organizing audio datasets specifically for training text-to-speech models. Does NOT include pre-built TTS datasets, TTS model training frameworks, or general speech datasets for ASR/voice cloning.

There are 36 tts dataset creation tools tracked. The highest-rated is hetpandya/youtube_tts_data_generator at 49/100 with 37 stars.

Get all 36 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=voice-ai&subcategory=tts-dataset-creation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 hetpandya/youtube_tts_data_generator

A python library to generate speech dataset from Youtube videos

49
Emerging
2 IS2AI/Kazakh_TTS

An expanded version of the previously released Kazakh text-to-speech...

47
Emerging
3 taresh18/TTSizer

🎙️ Automatically transcribe audio/video into high-quality, speaker-specific...

43
Emerging
4 Hecate2/sukasuka-vocal-dataset-builder

すかすかアニメボカロデータセット。1st anime vocal dataset. Extract audio (vocal) files from...

43
Emerging
5 youmebangbang/TTS-dataset-tools

Automatically generates TTS dataset using audio and associated text. Make...

43
Emerging
6 souvikg544/TTS_Data_Maker

Text to speech is an emerging zone of AI. This repository helps to create a...

39
Emerging
7 stefantaubert/pronunciation-dictionary-utils

Utils to modify pronunciation dictionaries.

39
Emerging
8 keonlee9420/DailyTalk

Official repository of DailyTalk: Spoken Dialogue Dataset for Conversational...

39
Emerging
9 gokhaneraslan/tts-dataset-generator

With this tool you can create custom TTS dataset from video or audio.

38
Emerging
10 revsic/speechset

Numpy-librosa implementation of Speech dataset pipeline

37
Emerging
11 GuangChen2333/FindUrVoicesPJSK

《世界计划 : 缤纷舞台》单角色语音数据集一键获取小工具 | 无需手动打标 | wav无压缩 | A simple tool for obtaining...

36
Emerging
12 FS-17/SpeechDataBuilder

Browser-based open-source tool for creating high-quality TTS/STT datasets....

35
Emerging
13 ShawnPi233/SynParaSpeech

Official Repository of Paper: "SynParaSpeech: Automated Synthesis of...

33
Emerging
14 danklabs/tts_dataset_maker

A gui to help make a text to speech dataset.

31
Emerging
15 hollygrimm/voice-dataset-creation

Tools to create your own voice dataset for TTS training

30
Emerging
16 MiniXC/phones

A collection of utilities for handling IPA phones.

30
Emerging
17 soukhova/TTS2016R

A data-package including the 2016 TTS origins, TTS destinations, number of...

30
Emerging
18 IS2AI/TurkicTTS

A multilingual text-to-speech synthesis system for ten lower-resourced...

29
Experimental
19 babua/TTSDatasetRecorder

A simple app for recording speech datasets.

28
Experimental
20 pilot7747/VoxDIY

This repository provides data and code for "Vox Populi, Vox DIY: Benchmark...

27
Experimental
21 hecko-yes/tts-dataset-prompts

Finally, some decent sample sentences

26
Experimental
22 nonverbalspeech38k/nonverspeech38k

The official repository for the paper “NonVerbalSpeech-38K: A Scalable...

26
Experimental
23 wkdrns202/TTSDataSetCleanser

TTSDataSetCleanser. This program can do the labeling work for the Raw Speech...

22
Experimental
24 egorsmkv/qirimtatar-tts-datasets

Open Source Crimean Tatar Text-to-Speech datasets

21
Experimental
25 Lostenergydrink/styletts2-dataset-toolkit

Complete Windows-optimized workflow for voice cloning with StyleTTS2....

21
Experimental
26 kdorichev/text2speech

Text-To-Speech Dataset Preparation and Architecture

20
Experimental
27 ItsJamin/another-tts

A program to easily create datasets for training own tts models.

20
Experimental
28 iuliiakr/TTS-Project-Framework

Architecture framework for building production-grade text-to-speech systems,...

19
Experimental
29 clayton14/tts_dataset_recorder

All you have to do is ramble to make a dataset for your voice

17
Experimental
30 hclivess/speech-splitter

Turn any audio file into a TTS training dataset

16
Experimental
31 willwade/TTS-Dataset

A workflow to create a dataset of all TTS voices/languages available on...

14
Experimental
32 taresh18/AnimeVox

🎧 11K High Quality Anime Audio Clips, Transcriptions & Speaker Labels for...

14
Experimental
33 MendoLeo/tts-dataset-pipeline

Democratizing speech technology: the simplest way to create custom TTS and...

14
Experimental
34 quochuy242/VNAVC

Data Pipeline for Text to Speech Project

13
Experimental
35 deeplearningcafe/animespeechdataset

Dataset Generation for Language Model Training and Text-to-Speech Synthesis...

13
Experimental
36 kuan2jiu99/GenderBias-TTS-Dataset

The official Github page of the paper "Gender Bias in Instruction-Guided...

11
Experimental