m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

80
/ 100
Verified

This tool helps you accurately transcribe audio recordings, providing not just the words but also precise timestamps for each word. It can also identify who is speaking at any given time, separating conversations by speaker. Anyone who needs highly accurate transcripts for audio analysis, subtitling, or content review would find this useful, such as researchers, journalists, or content creators.

20,758 stars. Used by 5 other packages. Actively maintained with 11 commits in the last 30 days. Available on PyPI.

Use this if you need to turn audio into text with exact word timings and speaker identification, especially for long recordings or multi-speaker conversations.

Not ideal if you only need a basic transcript without precise word-level timings or speaker separation, or if you prefer a service with a graphical user interface.

audio-transcription speech-to-text speaker-diarization subtitling qualitative-research
Maintenance 20 / 25
Adoption 15 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

20,758

Forks

2,188

Language

Python

License

BSD-2-Clause

Last pushed

Mar 17, 2026

Commits (30d)

11

Dependencies

12

Reverse dependents

5

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/m-bain/whisperX"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.