davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.

/ 100

Established

This tool helps researchers and content creators build high-quality audio datasets for training speech AI models. You can input raw audio from your own files, YouTube videos, LibriVox audiobooks, or TED Talks. It processes these inputs to remove silences, enhance sound quality, segment audio, identify and name speakers, transcribe speech into text, and generate organized datasets complete with speaker details, timestamps, and metrics like words-per-minute.

257 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to create clean, labeled audio datasets from diverse sources for developing speech-to-text or text-to-speech applications.

Not ideal if you primarily need a simple audio editor for personal use or a transcription service without the need for structured dataset generation and speaker analysis.

speech-recognition audio-analysis voice-AI content-creation language-modeling

Stale 6m

Maintenance 0 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 15 / 25

How are scores calculated?

Stars

257

Forks

Language

Python

License

MIT

Related tools

ynop/audiomate

Python library for handling audio datasets.

reazon-research/ReazonSpeech

Massive open Japanese speech corpus

common-voice/cv-dataset

Metadata and versioning details for the Common Voice dataset

EgorLakomkin/KTSpeechCrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

coqui-ai/open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Explore Voice AI Tools

All categories Trending Voice AI directory Insights