FireRedTeam/FireRedASR2S

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.

43
/ 100
Emerging

This system helps professionals accurately transcribe spoken audio, including both speech and singing, into text. It takes audio files in various languages and Chinese dialects and outputs precise text transcriptions, often with punctuation, language identification, and speech/music segmentation. This is ideal for content creators, researchers, and anyone needing detailed, accurate text from audio recordings.

365 stars.

Use this if you need highly accurate, industrial-grade transcriptions from diverse audio, especially for Chinese (Mandarin and dialects), English, or code-switching, including word-level timestamps and confidence scores.

Not ideal if your primary need is for a system that only processes a single, common language without advanced features like dialect identification or singing transcription.

audio-transcription voice-to-text multilingual-audio content-localization call-center-analytics
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 11 / 25
Community 12 / 25

How are scores calculated?

Stars

365

Forks

20

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/FireRedTeam/FireRedASR2S"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.