primepake/learnable-speech
This repo is text to speech with learnable audio encoder without alignment with transcript reference
This project helps developers and researchers working on advanced text-to-speech systems to train highly accurate and natural-sounding voice models. It takes raw audio and text transcripts as input to produce sophisticated audio representations, which can then be used to generate high-quality 24kHz synthetic speech. Machine learning engineers and speech AI researchers would primarily use this to build next-generation speech synthesis applications.
No commits in the last 6 months.
Use this if you need to train a custom text-to-speech model with learnable audio encoders for high-fidelity 24kHz speech generation without relying on traditional alignment methods.
Not ideal if you are looking for a ready-to-use text-to-speech system for direct audio generation without deep model customization or if you lack machine learning expertise.
Stars
54
Forks
7
Language
Python
License
—
Category
Last pushed
Sep 20, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/primepake/learnable-speech"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
canopyai/Orpheus-TTS
Towards Human-Sounding Speech
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo...
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in...
umbertocappellazzo/Omni-AVSR
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition...
ExplainableML/ZerAuCap
[NeurIPS 2023 - ML for Audio Workshop (Oral)] Zero-shot audio captioning with audio-language...