whisper-timestamped and Whisper-Finetune

The timestamped variant provides the inference capability that the fine-tuning tool enhances, making them complements—one extends Whisper's base transcription output with word-level timing while the other optimizes Whisper through custom training on domain-specific data.

whisper-timestamped
58
Established
Whisper-Finetune
56
Established
Maintenance 2/25
Adoption 12/25
Maturity 25/25
Community 19/25
Maintenance 6/25
Adoption 10/25
Maturity 16/25
Community 24/25
Stars: 2,778
Forks: 209
Downloads:
Commits (30d): 0
Language: Python
License: AGPL-3.0
Stars: 1,200
Forks: 213
Downloads:
Commits (30d): 0
Language: C
License: Apache-2.0
Stale 6m
No Package No Dependents

About whisper-timestamped

linto-ai/whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

This tool helps transcription professionals, researchers, or content creators accurately transcribe audio or video recordings. It takes an audio or video file as input and produces a detailed transcript with precise timestamps for each word, along with a confidence score for both individual words and speech segments. This is ideal for anyone who needs highly accurate, word-level timing in their transcriptions.

transcription audio-analysis video-editing linguistics content-creation

About Whisper-Finetune

yeyupiaoling/Whisper-Finetune

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment

This project helps you improve the accuracy and speed of transcribing audio into text using the Whisper speech recognition system. It allows you to customize the system with your own audio recordings and their corresponding text, even if your data doesn't include exact timing information. The enhanced system can then quickly convert new audio files into accurate written transcripts, and can be deployed in web applications, desktop programs, or Android devices. This is for professionals like journalists, researchers, or content creators who need highly accurate and fast audio transcription tailored to specific languages or accents.

speech-to-text audio-transcription voice-recognition language-processing content-creation

Scores updated daily from GitHub, PyPI, and npm data. How scores work