mubingshen/MLC-SLM-Baseline

The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-SLM) to provide participants with baseline systems for speech recognition and speaker diarization in multilingual conversational scenario.

/ 100

Emerging

This project offers foundational models for automatic speech recognition (ASR) and speaker diarization in complex, multilingual conversations. It takes raw audio recordings of natural, multi-speaker dialogue and processes them to produce highly accurate transcriptions and identify who spoke when. Researchers and developers building advanced AI systems for human-computer interaction, especially in spoken dialogue, would find this useful.

No commits in the last 6 months.

Use this if you are developing or benchmarking systems that need to accurately transcribe and identify speakers in multilingual, real-world conversational audio, including overlaps and interruptions.

Not ideal if your primary need is for simple, single-speaker speech-to-text without the complexities of diarization or multilingual conversational nuances.

spoken-dialogue-systems multilingual-speech-recognition speaker-diarization conversational-AI natural-language-processing

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

jncraton/languagemodels

Explore large language models in 512MB of RAM

microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

haizelabs/verdict

Inference-time scaling for LLMs-as-a-judge.

albertan017/LLM4Decompile

Reverse Engineering: Decompiling Binary Code with Large Language Models

bytedance/Sa2VA

Official Repo For Pixel-LLM Codebase

Explore Transformer Models

All categories Trending Transformer directory Insights