PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

74
/ 100
Verified

This toolkit helps you work with spoken language, allowing you to convert audio into written text, translate spoken English to Chinese, and generate natural-sounding speech from written text. It takes audio files or text as input and produces transcribed text, translated text, or synthetic speech. Anyone who needs to process or create speech, such as content creators, linguists, or call center managers, would find this useful.

12,556 stars. Actively maintained with 3 commits in the last 30 days. Available on PyPI.

Use this if you need to quickly transcribe audio, translate spoken content, or create realistic voiceovers from text.

Not ideal if your primary need is advanced audio editing, music production, or highly specialized sound analysis beyond speech processing.

speech-to-text text-to-speech audio-translation voice-generation language-processing
Maintenance 16 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 23 / 25

How are scores calculated?

Stars

12,556

Forks

1,956

Language

Python

License

Apache-2.0

Last pushed

Mar 16, 2026

Commits (30d)

3

Dependencies

50

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/PaddlePaddle/PaddleSpeech"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.