MohammadrezaAmani/JameJamCorpus
Official repository of Jam-e Jam News Dataset and NLP Model.
This project offers a dataset of over 1.4 million news articles from Jam-e Jam Online, including titles, tags, summaries, and full content. It also provides a ready-to-use NLP model to automatically categorize these articles by type and tags. Data scientists, researchers, and media analysts can use this for various text analysis and classification tasks.
No commits in the last 6 months.
Use this if you need a large, categorized Persian news corpus for research, language model training, or developing applications that classify news content.
Not ideal if your work doesn't involve Persian news content, or if you require real-time news scraping rather than a static dataset.
Stars
8
Forks
1
Language
Python
License
GPL-3.0
Category
Last pushed
Apr 22, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/MohammadrezaAmani/JameJamCorpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
amirshnll/Persian-Swear-Words
Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات...
sajjjadayobi/PersianQA
Persian (Farsi) Question Answering Dataset (+ Models)
aghasemi/ChronologicalPersianPoetryDataset
A chronological (up to the century in which the poet has lived) of Persian poetry, extracted...
miras-tech/MirasText
MirasText
BodduSriPavan-111/chandassu
Chandassu: First Python Library for Global Metrical Poetry