davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.

50
/ 100
Established

This tool helps researchers and content creators build high-quality audio datasets for training speech AI models. You can input raw audio from your own files, YouTube videos, LibriVox audiobooks, or TED Talks. It processes these inputs to remove silences, enhance sound quality, segment audio, identify and name speakers, transcribe speech into text, and generate organized datasets complete with speaker details, timestamps, and metrics like words-per-minute.

257 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to create clean, labeled audio datasets from diverse sources for developing speech-to-text or text-to-speech applications.

Not ideal if you primarily need a simple audio editor for personal use or a transcription service without the need for structured dataset generation and speaker analysis.

speech-recognition audio-analysis voice-AI content-creation language-modeling
Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 15 / 25

How are scores calculated?

Stars

257

Forks

25

Language

Python

License

MIT

Last pushed

Jun 10, 2024

Commits (30d)

0

Dependencies

12

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/davidmartinrius/speech-dataset-generator"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.