gpustack/vox-box

A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.

50
/ 100
Established

This tool allows developers to quickly set up a server for converting spoken audio into written text or turning written text into natural-sounding speech. You input audio files or written text, and it outputs the corresponding text transcriptions or audio narration. It's designed for developers building applications that need robust speech recognition or text-to-speech capabilities, such as voice assistants or content creation tools.

200 stars.

Use this if you are a developer integrating text-to-speech or speech-to-text functionality into an application and need a local server solution.

Not ideal if you are an end-user looking for a ready-to-use application with a graphical interface for transcribing audio or generating speech.

application-development voice-technology AI-integration speech-recognition audio-narration
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

200

Forks

32

Language

Python

License

Apache-2.0

Last pushed

Dec 23, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/gpustack/vox-box"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.