sarulab-speech/UTMOSv2
UTokyo-SaruLab MOS Prediction System
This project helps evaluate the naturalness and quality of synthetic speech by predicting its Mean Opinion Score (MOS). You input a .wav audio file (or multiple files in a directory), and it outputs a numerical MOS score, indicating how natural the speech sounds. Voice AI engineers, researchers, and product managers working with text-to-speech or voice synthesis technologies would find this valuable for quality assurance and model evaluation.
301 stars.
Use this if you need to objectively assess the perceived naturalness and quality of computer-generated speech without needing human listeners for every evaluation.
Not ideal if you are looking to generate synthetic speech or analyze speech content (like transcription or speaker recognition) rather than evaluate its perceived quality.
Stars
301
Forks
29
Language
Python
License
MIT
Category
Last pushed
Feb 23, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/sarulab-speech/UTMOSv2"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
voicepaw/so-vits-svc-fork
so-vits-svc fork with realtime support, improved interface and more features.
ssmall256/mlx-audio-io
Native audio I/O for MLX on macOS and Linux
ssmall256/mlx-spectro
High-performance STFT/iSTFT for Apple MLX with fused Metal kernels and autograd support
daniilrobnikov/vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
MWM-io/SpecTNT-pytorch
Unofficial implementation of SpecTNT in pytorch