voicebox and vox-box
These are **competitors**: both provide self-hosted TTS server solutions with open-source backends, though voicebox offers a broader visual studio interface while vox-box emphasizes OpenAI API compatibility across multiple synthesis engines.
About voicebox
jamiepine/voicebox
The open-source voice synthesis studio
Voicebox is an open-source voice synthesis studio that allows you to clone voices from short audio samples and generate speech in multiple languages with various effects. You can input text and existing voice recordings to create high-quality, expressive spoken audio. This tool is ideal for content creators, podcasters, game developers, or anyone needing realistic, customizable voiceovers.
About vox-box
gpustack/vox-box
A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.
This tool allows developers to quickly set up a server for converting spoken audio into written text or turning written text into natural-sounding speech. You input audio files or written text, and it outputs the corresponding text transcriptions or audio narration. It's designed for developers building applications that need robust speech recognition or text-to-speech capabilities, such as voice assistants or content creation tools.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work