m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
This tool helps you accurately transcribe audio recordings, providing not just the words but also precise timestamps for each word. It can also identify who is speaking at any given time, separating conversations by speaker. Anyone who needs highly accurate transcripts for audio analysis, subtitling, or content review would find this useful, such as researchers, journalists, or content creators.
20,758 stars. Used by 5 other packages. Actively maintained with 11 commits in the last 30 days. Available on PyPI.
Use this if you need to turn audio into text with exact word timings and speaker identification, especially for long recordings or multi-speaker conversations.
Not ideal if you only need a basic transcript without precise word-level timings or speaker separation, or if you prefer a service with a graphical user interface.
Stars
20,758
Forks
2,188
Language
Python
License
BSD-2-Clause
Category
Last pushed
Mar 17, 2026
Commits (30d)
11
Dependencies
12
Reverse dependents
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/m-bain/whisperX"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Recent Releases
Compare
Related tools
tsmdt/whisply
💬 Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and...
jim60105/docker-whisperX
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker...
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
linto-ai/linto-stt
An automatic speech recognition API
linto-ai/linto-studio
Transcription and annotation interface for recorded audio or video files