sarulab-speech/UTMOSv2

UTokyo-SaruLab MOS Prediction System

/ 100

Established

This project helps evaluate the naturalness and quality of synthetic speech by predicting its Mean Opinion Score (MOS). You input a .wav audio file (or multiple files in a directory), and it outputs a numerical MOS score, indicating how natural the speech sounds. Voice AI engineers, researchers, and product managers working with text-to-speech or voice synthesis technologies would find this valuable for quality assurance and model evaluation.

301 stars.

Use this if you need to objectively assess the perceived naturalness and quality of computer-generated speech without needing human listeners for every evaluation.

Not ideal if you are looking to generate synthetic speech or analyze speech content (like transcription or speaker recognition) rather than evaluate its perceived quality.

speech synthesis voice AI audio quality evaluation text-to-speech voice user interface

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

301

Forks

Language

Python

License

MIT

Related frameworks

voicepaw/so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.

ssmall256/mlx-audio-io

Native audio I/O for MLX on macOS and Linux

ssmall256/mlx-spectro

High-performance STFT/iSTFT for Apple MLX with fused Metal kernels and autograd support

daniilrobnikov/vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

MWM-io/SpecTNT-pytorch

Unofficial implementation of SpecTNT in pytorch

Explore ML Frameworks

All categories Trending ML Framework directory Insights