youmebangbang/TTS-dataset-tools

Automatically generates TTS dataset using audio and associated text. Make cuts under a custom length. Uses Google Speech to text API to perform diarization and transcription or aeneas to force align text to audio.

43
/ 100
Emerging

This tool helps podcasters, educators, or content creators automatically prepare audio for text-to-speech (TTS) models. You provide audio files and their corresponding text, and it generates segmented audio clips with precise text alignments. The primary users are individuals who need to create custom, high-quality TTS datasets efficiently.

No commits in the last 6 months.

Use this if you have long audio recordings and want to automatically split them into short, transcribed segments perfect for training custom text-to-speech voices.

Not ideal if you need to process extremely large batches of files without any manual review or if you prefer not to use cloud transcription services.

podcast-production e-learning-content audio-transcription speech-synthesis content-creation
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

52

Forks

17

Language

Python

License

MIT

Last pushed

Apr 17, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/youmebangbang/TTS-dataset-tools"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.