kaushiknishchay/ComfyUI-Qwen3-ASR
ComfyUI nodes for Qwen3-ASR (0.6B/1.7B) and ForcedAligner. Supports high-accuracy ASR and language identification for 52 languages/dialects, including 22 Chinese dialects and various English accents. Features word-level timestamps, long audio transcription, and VRAM-optimized inference.
This tool helps you convert spoken audio into written text with high accuracy, automatically identifying the language from 52 options, including various Chinese dialects and English accents. You feed in an audio file, and it outputs a precise transcription, optionally with timestamps for each word. It's ideal for anyone who needs to quickly and accurately transcribe audio content.
Use this if you need to transcribe audio recordings, interviews, podcasts, or lectures into text, especially if they contain multiple languages or require precise timing information for individual words.
Not ideal if you primarily work with text-based content and do not have audio files that need converting to written form.
Stars
11
Forks
3
Language
Python
License
MIT
Category
Last pushed
Mar 05, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/kaushiknishchay/ComfyUI-Qwen3-ASR"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BoltzmannEntropy/MimikaStudio
MimikaStudio - A local-first application for macOS (Apple Silicon) + Agentic MCP Support
aahl/qwen-asr2api
🎤 Qwen 3 ASR to OpenAI API, 免费STT语音识别模型
gabriele-mastrapasqua/qwen3-tts
Pure C inference engine for Qwen3-TTS text-to-speech. No Python, no PyTorch — just C and BLAS....
zhao-kun/VibeVoiceFusion
VibeVoiceFusion is a full-stack, multi-speaker voice generation web system featuring LoRA...
shijincai/VibeVoice
Archive of the official Microsoft VibeVoice repository (7B & 1.5B). Backup of the deleted source...