MahtaFetrat/ManaTTS-Persian-Speech-Dataset

ManaTTS is the largest open Persian speech dataset with 114+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

36
/ 100
Emerging

This project offers the largest open Persian speech dataset, ManaTTS, containing over 114 hours of transcribed audio from Nasl-e-Mana magazine. It provides ready-to-use audio and text files, along with tools to collect and process similar data. This resource is ideal for speech synthesis researchers and developers building high-quality Persian text-to-speech models and assistive technologies.

No commits in the last 6 months.

Use this if you need extensive, high-quality Persian speech data to train text-to-speech models or develop applications like screen readers for the Iranian blind community.

Not ideal if you need speech data in a language other than Persian or require a multi-speaker dataset.

Persian-language-technology speech-synthesis assistive-technology natural-language-processing dataset-creation
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 10 / 25

How are scores calculated?

Stars

49

Forks

5

Language

Jupyter Notebook

License

MIT

Last pushed

Jul 12, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/MahtaFetrat/ManaTTS-Persian-Speech-Dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.