whisperX-FastAPI and whisper.api
These are competitors offering alternative API wrapper implementations around Whisper-based speech-to-text models, with the first built on WhisperX (optimized for speaker diarization) and the second on a fine-tuned Whisper variant, serving similar use cases of exposing ASR functionality via HTTP endpoints.
About whisperX-FastAPI
pavelzbornik/whisperX-FastAPI
FastAPI service on top of WhisperX
This tool helps convert audio and video recordings into text transcripts, identifying different speakers and aligning the text with the audio. You provide an audio or video file (like an interview, meeting recording, or lecture) and receive a detailed text output, making it easier to analyze spoken content. Anyone who needs to extract written information from spoken content, such as journalists, researchers, or content creators, would find this useful.
About whisper.api
innovatorved/whisper.api
This project provides an API with user level access support to transcribe speech to text using a finetuned and processed Whisper ASR model.
This is a self-hosted API for converting spoken audio into written text. You feed it audio files or live audio streams, and it produces a transcript in formats like JSON, SRT, or VTT. It's designed for developers and technical teams who need to integrate high-performance speech-to-text capabilities directly into their applications or workflows, while keeping full control over their data.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work