mubingshen/MLC-SLM-Baseline
The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-SLM) to provide participants with baseline systems for speech recognition and speaker diarization in multilingual conversational scenario.
This project offers foundational models for automatic speech recognition (ASR) and speaker diarization in complex, multilingual conversations. It takes raw audio recordings of natural, multi-speaker dialogue and processes them to produce highly accurate transcriptions and identify who spoke when. Researchers and developers building advanced AI systems for human-computer interaction, especially in spoken dialogue, would find this useful.
No commits in the last 6 months.
Use this if you are developing or benchmarking systems that need to accurately transcribe and identify speakers in multilingual, real-world conversational audio, including overlaps and interruptions.
Not ideal if your primary need is for simple, single-speaker speech-to-text without the complexities of diarization or multilingual conversational nuances.
Stars
50
Forks
6
Language
Python
License
—
Category
Last pushed
May 14, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/mubingshen/MLC-SLM-Baseline"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jncraton/languagemodels
Explore large language models in 512MB of RAM
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
haizelabs/verdict
Inference-time scaling for LLMs-as-a-judge.
albertan017/LLM4Decompile
Reverse Engineering: Decompiling Binary Code with Large Language Models
bytedance/Sa2VA
Official Repo For Pixel-LLM Codebase