freds0/kabooks

KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using audiobooks, KABooks will generate dataset with segmented audios and aligned texts.

28
/ 100
Experimental

This tool helps researchers and AI practitioners create high-quality datasets for training speech recognition and text-to-speech models. It takes an audiobook's full audio file and its corresponding text as input, then outputs segmented audio clips aligned with their exact textual transcriptions. This is ideal for those working on voice AI.

No commits in the last 6 months.

Use this if you need to quickly generate large, accurately aligned audio-text datasets from audiobooks for your AI model training.

Not ideal if your source material isn't a long-form audiobook or if you don't need highly precise audio-to-text alignments.

speech-recognition text-to-speech AI-dataset-creation natural-language-processing audio-processing
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 15 / 25

How are scores calculated?

Stars

12

Forks

4

Language

Python

License

Last pushed

Mar 24, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/freds0/kabooks"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.