i4Ds/whisper-prep
Data preparation utility for the finetuning of OpenAI's Whisper model.
This tool helps AI engineers, machine learning practitioners, and data scientists prepare audio and transcript data for training speech-to-text models like OpenAI's Whisper. It takes raw audio files, sentence-level datasets, or existing SRT/VTT transcripts and outputs meticulously segmented, cleaned, and formatted datasets ready for model fine-tuning. This ensures high-quality training data, leading to better performing transcription models.
Use this if you need to create a high-quality, normalized dataset of audio and text for training an automatic speech recognition (ASR) model, especially when working with varied raw audio sources or sentence-level datasets.
Not ideal if you simply need to transcribe a single audio file or process a small number of transcripts without the intent of training or fine-tuning a machine learning model.
Stars
11
Forks
1
Language
Python
License
MIT
Category
Last pushed
Mar 03, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/i4Ds/whisper-prep"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
AbdullahHendy/live-translation
Real-time speech-to-text translation over WebSocket. Streams Opus or raw PCM audio from client...
i4Ds/whisper-finetune
This repository contains code for fine-tuning the Whisper speech-to-text model.
512z/podlens
Free Podwise: AI Podcast & Youtube Transcription & Understanding Agent | 播客+youtube转文字/学习/可视化AI工具
Gr122lyBr/voicetag
Speaker identification powered by pyannote and resemblyzer
aws-solutions/content-localization-on-aws
Automatically generate multi-language subtitles using AWS AI/ML services. Machine generated...