feryandi/Dataset-Artikel

Repository ini berisikan kumpulan data mentah berupa artikel dari berbagai media online di Indonesia. (Raw dataset of Indonesian news articles)

34
/ 100
Emerging

This project provides thousands of raw Indonesian news articles, collected from major online media outlets like Detik, Kompas, and CNN Indonesia. You can get either clean JSON files containing only the article content or the original HTML files. This dataset is for researchers, linguists, or students who need a large collection of Indonesian text for natural language processing, linguistic analysis, or machine learning model training.

No commits in the last 6 months.

Use this if you need a free, raw, and unlabelled dataset of Indonesian news articles for academic or research purposes.

Not ideal if you require pre-labelled data for specific tasks like sentiment analysis, or if you need articles published outside the January-August 2018 range.

natural-language-processing linguistics-research text-mining computational-linguistics machine-learning-datasets
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 10 / 25

How are scores calculated?

Stars

42

Forks

4

Language

License

CC-BY-SA-4.0

Last pushed

Mar 24, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/feryandi/Dataset-Artikel"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.