Text-to-Speech Frameworks

End-to-end TTS architectures, models, and toolkits for synthesizing speech from text. Includes transformer-based, diffusion-based, and flow-matching approaches with various duration modeling techniques. Does NOT include voice cloning, speech recognition, speech evaluation metrics, or TTS paper collections.

There are 9 text-to-speech frameworks tracked. 1 score above 70 (verified tier). The highest-rated is voicepaw/so-vits-svc-fork at 78/100 with 9,281 stars. 1 of the top 10 are actively maintained.

Get all 9 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=text-to-speech-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 voicepaw/so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.

78
Verified
2 sarulab-speech/UTMOSv2

UTokyo-SaruLab MOS Prediction System

51
Established
3 ssmall256/mlx-audio-io

Native audio I/O for MLX on macOS and Linux

44
Emerging
4 ssmall256/mlx-spectro

High-performance STFT/iSTFT for Apple MLX with fused Metal kernels and...

43
Emerging
5 daniilrobnikov/vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for...

35
Emerging
6 MWM-io/SpecTNT-pytorch

Unofficial implementation of SpecTNT in pytorch

33
Emerging
7 nipponjo/arabic-vocalization

Arabic deep-learning based diacritization models (Shakkala, Shakkelha)...

27
Experimental
8 NTIA/alignnet

Train no-reference speech quality estimators with multiple datasets via...

24
Experimental
9 kuntiniong/hk-insta-identifier

Hong Kong Instagram username identification with Romanized Cantonese linguistics

22
Experimental