microsoft/SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

46
/ 100
Emerging

This project offers advanced pre-trained models that can understand and generate spoken language. It takes spoken audio or text as input and produces transcribed text, synthesized speech, or translations into other languages. This is ideal for developers creating applications that interact with users through speech, such as virtual assistants or transcription services.

1,435 stars. No commits in the last 6 months.

Use this if you are a developer building applications that require high-quality speech recognition, text-to-speech, or speech translation capabilities.

Not ideal if you are looking for an out-of-the-box, no-code solution for end-user tasks without any development work.

speech-recognition text-to-speech speech-translation natural-language-processing audio-processing
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

1,435

Forks

135

Language

Python

License

MIT

Last pushed

Apr 24, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/microsoft/SpeechT5"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.