miras-tech/MirasText
MirasText
MirasText helps Natural Language Processing (NLP) researchers and practitioners working with the Persian language. It provides a large collection of Persian text, enabling the training and evaluation of language models and other NLP systems. Researchers and data scientists focused on Persian language understanding would find this valuable.
No commits in the last 6 months.
Use this if you need a foundational dataset for developing or testing NLP models specifically for the Persian language.
Not ideal if your project involves languages other than Persian, or if you require datasets with specific annotations like sentiment or irony.
Stars
75
Forks
8
Language
Python
License
MIT
Category
Last pushed
Aug 12, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/miras-tech/MirasText"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
amirshnll/Persian-Swear-Words
Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات...
sajjjadayobi/PersianQA
Persian (Farsi) Question Answering Dataset (+ Models)
aghasemi/ChronologicalPersianPoetryDataset
A chronological (up to the century in which the poet has lived) of Persian poetry, extracted...
farbodbj/persian-gender-by-name
A comprehensive dataset for determining gender based on Persian names, enriched with English...
dml-qom/FarsTail
FarsTail: a Persian natural language inference dataset