gokhaneraslan/tts-dataset-generator

With this tool you can create custom TTS dataset from video or audio.

38
/ 100
Emerging

This tool helps you turn long audio or video recordings into neatly organized datasets for training custom text-to-speech (TTS) voices. You input raw audio or video files, and it automatically breaks them into speech segments, transcribes them using AI, and outputs properly formatted audio clips and a text file of aligned transcripts. It's perfect for voice actors, linguists, or educators who need to create custom voice models from their own recordings.

No commits in the last 6 months.

Use this if you need to create a high-quality, segmented, and transcribed dataset from audio or video files to train a custom text-to-speech voice or for large-scale transcription.

Not ideal if you only need a quick transcription of a short audio file without the need for segmentation or dataset formatting for voice model training.

voice-synthesis speech-recognition audio-transcription voice-cloning e-learning-content
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

13

Forks

5

Language

Python

License

Apache-2.0

Last pushed

Jun 07, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/gokhaneraslan/tts-dataset-generator"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.