MahtaFetrat/ManaTTS-Persian-Speech-Dataset
ManaTTS is the largest open Persian speech dataset with 114+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
This project offers the largest open Persian speech dataset, ManaTTS, containing over 114 hours of transcribed audio from Nasl-e-Mana magazine. It provides ready-to-use audio and text files, along with tools to collect and process similar data. This resource is ideal for speech synthesis researchers and developers building high-quality Persian text-to-speech models and assistive technologies.
No commits in the last 6 months.
Use this if you need extensive, high-quality Persian speech data to train text-to-speech models or develop applications like screen readers for the Iranian blind community.
Not ideal if you need speech data in a language other than Persian or require a multi-speaker dataset.
Stars
49
Forks
5
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jul 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/MahtaFetrat/ManaTTS-Persian-Speech-Dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
tihu-nlp/tihu
Persian Text-To-Speech
persiandataset/PersianSpeech
Persian ASR dataset
mmahdibarghi/finglish-dataset
Persian to Finglish dataset with all the sentences voice for TTS dataset used to train tacotron2
MahtaFetrat/VirgoolInformal-Speech-Dataset
A dataset of informal Persian audio and text chunks, along with a fully open processing...
MahtaFetrat/GPTInformal-Persian-Speech-Dataset
A free licensed Persian TTS dataset including 6+ hours of audio-text pairs with subject