MahtaFetrat/ManaTTS-Persian-Speech-Dataset

ManaTTS is the largest open Persian speech dataset with 114+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

/ 100

Emerging

This project offers the largest open Persian speech dataset, ManaTTS, containing over 114 hours of transcribed audio from Nasl-e-Mana magazine. It provides ready-to-use audio and text files, along with tools to collect and process similar data. This resource is ideal for speech synthesis researchers and developers building high-quality Persian text-to-speech models and assistive technologies.

No commits in the last 6 months.

Use this if you need extensive, high-quality Persian speech data to train text-to-speech models or develop applications like screen readers for the Iranian blind community.

Not ideal if you need speech data in a language other than Persian or require a multi-speaker dataset.

Persian-language-technology speech-synthesis assistive-technology natural-language-processing dataset-creation

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Compare

ManaTTS-Persian-Speech-Dataset and GPTInformal-Persian-Speech-Dataset

Higher-rated alternatives

tihu-nlp/tihu

Persian Text-To-Speech

persiandataset/PersianSpeech

Persian ASR dataset

mmahdibarghi/finglish-dataset

Persian to Finglish dataset with all the sentences voice for TTS dataset used to train tacotron2

MahtaFetrat/VirgoolInformal-Speech-Dataset

A dataset of informal Persian audio and text chunks, along with a fully open processing...

MahtaFetrat/GPTInformal-Persian-Speech-Dataset

A free licensed Persian TTS dataset including 6+ hours of audio-text pairs with subject

Explore Voice AI Tools

All categories Trending Voice AI directory Insights