davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
This tool helps researchers and content creators build high-quality audio datasets for training speech AI models. You can input raw audio from your own files, YouTube videos, LibriVox audiobooks, or TED Talks. It processes these inputs to remove silences, enhance sound quality, segment audio, identify and name speakers, transcribe speech into text, and generate organized datasets complete with speaker details, timestamps, and metrics like words-per-minute.
257 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to create clean, labeled audio datasets from diverse sources for developing speech-to-text or text-to-speech applications.
Not ideal if you primarily need a simple audio editor for personal use or a transcription service without the need for structured dataset generation and speaker analysis.
Stars
257
Forks
25
Language
Python
License
MIT
Category
Last pushed
Jun 10, 2024
Commits (30d)
0
Dependencies
12
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/davidmartinrius/speech-dataset-generator"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
ynop/audiomate
Python library for handling audio datasets.
reazon-research/ReazonSpeech
Massive open Japanese speech corpus
common-voice/cv-dataset
Metadata and versioning details for the Common Voice dataset
EgorLakomkin/KTSpeechCrawler
Automatically constructing corpus for automatic speech recognition from YouTube videos
coqui-ai/open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies