taeyoun811/Whisfusion
Whisfusion: Parallel ASR Decoding via a Diffusion Transformer
This tool helps convert spoken audio into written text much faster and more accurately than existing methods. You input an audio file, and it quickly outputs a text transcript. This is ideal for researchers or developers building applications that require high-speed, accurate speech-to-text transcription.
No commits in the last 6 months.
Use this if you need to transcribe long audio files quickly and accurately for applications like meeting minutes, voice assistants, or content moderation.
Not ideal if you are an end-user simply looking for a ready-to-use transcription service, as this requires technical setup and programming.
Stars
22
Forks
3
Language
Python
License
Apache-2.0
Category
Last pushed
Aug 22, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/taeyoun811/Whisfusion"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PaddlePaddle/PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with...
k2-fsa/sherpa
Speech-to-text server framework with next-gen Kaldi
Picovoice/cheetah
On-device streaming speech-to-text engine powered by deep learning
yeyupiaoling/YeAudio
Python的音频工具
zaigie/FunSpeech
开箱即用的本地私有化部署语音服务,快速搭建FunASR与CosyVoice2/3后端