Text-to-Speech Frameworks

End-to-end TTS architectures, models, and toolkits for synthesizing speech from text. Includes transformer-based, diffusion-based, and flow-matching approaches with various duration modeling techniques. Does NOT include voice cloning, speech recognition, speech evaluation metrics, or TTS paper collections.

There are 9 text-to-speech frameworks tracked. 1 score above 70 (verified tier). The highest-rated is voicepaw/so-vits-svc-fork at 78/100 with 9,281 stars. 1 of the top 10 are actively maintained.

Get all 9 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=text-to-speech-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Framework	Score	Tier	Stars	Language
1	voicepaw/so-vits-svc-fork so-vits-svc fork with realtime support, improved interface and more features.	78	Verified	9,281	Python
2	sarulab-speech/UTMOSv2 UTokyo-SaruLab MOS Prediction System	51	Established	301	Python
3	ssmall256/mlx-audio-io Native audio I/O for MLX on macOS and Linux	44	Emerging	2	C++
4	ssmall256/mlx-spectro High-performance STFT/iSTFT for Apple MLX with fused Metal kernels and...	43	Emerging	1	Python
5	daniilrobnikov/vits VITS: Conditional Variational Autoencoder with Adversarial Learning for...	35	Emerging	6	Jupyter Notebook
6	MWM-io/SpecTNT-pytorch Unofficial implementation of SpecTNT in pytorch	33	Emerging	50	Python
7	nipponjo/arabic-vocalization Arabic deep-learning based diacritization models (Shakkala, Shakkelha)...	27	Experimental	14	Python
8	NTIA/alignnet Train no-reference speech quality estimators with multiple datasets via...	24	Experimental	18	Python
9	kuntiniong/hk-insta-identifier Hong Kong Instagram username identification with Romanized Cantonese linguistics	22	Experimental	17	Jupyter Notebook