TTS-Audio-Suite and ComfyUI-VibeVoice

These are **competitors**: both provide TTS capabilities for ComfyUI, with TTS-Audio-Suite offering broader multi-engine support (RVC, Echo-TTS, Qwen3-TTS, etc.) while VibeVoice specializes in expressive long-form conversational audio, requiring users to select one based on their specific TTS requirements.

TTS-Audio-Suite

Established

ComfyUI-VibeVoice

Established

Maintenance 25/25

Adoption 10/25

Maturity 15/25

Community 18/25

Maintenance 2/25

Adoption 10/25

Maturity 15/25

Community 23/25

Stars: 774

Forks: 71

Downloads: —

Commits (30d): 79

Language: Python

License: —

Stars: 563

Forks: 105

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No Package No Dependents

Stale 6m No Package No Dependents

About TTS-Audio-Suite

diodiogod/TTS-Audio-Suite

A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools

This suite helps video producers, content creators, and educators quickly turn written scripts into natural-sounding speech across many languages and voices. You input your text, choose from various AI voices, and the system generates audio, complete with precise timing for subtitles. It's designed for anyone needing professional-grade voiceovers or narrated content without hiring voice actors.

video-production content-creation localization e-learning audio-narration

About ComfyUI-VibeVoice

wildminder/ComfyUI-VibeVoice

ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audio

This tool helps content creators, podcasters, and educators generate natural-sounding, multi-speaker audio conversations from a written script. You provide a text dialogue and optionally some reference audio clips for specific voices, and it produces a single audio file with up to four distinct, expressive speakers. It's designed for anyone who needs high-quality, long-form conversational audio without recording multiple people.

podcasting audiobook creation content generation e-learning development dialogue synthesis

Related comparisons

TTS-Audio-Suite and VibeVoice-ComfyUI TTS-Audio-Suite and ComfyUI-VoxCPM TTS-Audio-Suite and ComfyUI-EdgeTTS TTS-Audio-Suite and ComfyUI-XTTS TTS-Audio-Suite and ComfyUI-Maya1_TTS TTS-Audio-Suite and ComfyUI-SparkTTS

Scores updated daily from GitHub, PyPI, and npm data. How scores work