whisperX and whisperVideo
WhisperX provides the core diarization and word-level timestamp functionality that WhisperVideo builds upon to attribute speech segments to speakers in video files, making them complements rather than competitors.
About whisperX
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
This tool helps you accurately transcribe audio recordings, providing not just the words but also precise timestamps for each word. It can also identify who is speaking at any given time, separating conversations by speaker. Anyone who needs highly accurate transcripts for audio analysis, subtitling, or content review would find this useful, such as researchers, journalists, or content creators.
About whisperVideo
showlab/whisperVideo
Find out who said what in the video.
This tool helps content creators, educators, or researchers automatically identify who is speaking in a video and what they are saying. You provide a video file, and it generates a new video with on-screen speaker panels and subtitles, clearly linking each spoken word to the person who said it. This is ideal for anyone needing to quickly review conversations or generate accurate, speaker-attributed transcripts from long-form videos.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work