MohammadrezaAmani/JameJamCorpus

Official repository of Jam-e Jam News Dataset and NLP Model.

28
/ 100
Experimental

This project offers a dataset of over 1.4 million news articles from Jam-e Jam Online, including titles, tags, summaries, and full content. It also provides a ready-to-use NLP model to automatically categorize these articles by type and tags. Data scientists, researchers, and media analysts can use this for various text analysis and classification tasks.

No commits in the last 6 months.

Use this if you need a large, categorized Persian news corpus for research, language model training, or developing applications that classify news content.

Not ideal if your work doesn't involve Persian news content, or if you require real-time news scraping rather than a static dataset.

news-analysis media-research text-classification natural-language-processing data-science
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 4 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

8

Forks

1

Language

Python

License

GPL-3.0

Last pushed

Apr 22, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/MohammadrezaAmani/JameJamCorpus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.