FreedomIntelligence/MTalk-Bench
MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols
This tool helps researchers and developers assess how well their speech-to-speech AI models perform in realistic, multi-turn conversations. You provide your model's audio responses to a set of conversational prompts, and the benchmark evaluates them across semantic understanding, paralinguistic aspects like tone, and ambient sound interaction. It's designed for AI researchers and engineers building and refining conversational AI and large language models.
Use this if you are developing or evaluating speech-to-speech AI models and need a comprehensive way to benchmark their performance in dynamic, multi-turn dialogue scenarios.
Not ideal if you are looking for a tool to simply transcribe audio or translate speech, as its primary purpose is advanced model evaluation.
Stars
18
Forks
1
Language
JavaScript
License
Apache-2.0
Category
Last pushed
Nov 19, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/FreedomIntelligence/MTalk-Bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
sierra-research/tau2-bench
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
bigcode-project/bigcodebench
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
scicode-bench/SciCode
A benchmark that challenges language models to code solutions for scientific problems