feryandi/Dataset-Artikel
Repository ini berisikan kumpulan data mentah berupa artikel dari berbagai media online di Indonesia. (Raw dataset of Indonesian news articles)
This project provides thousands of raw Indonesian news articles, collected from major online media outlets like Detik, Kompas, and CNN Indonesia. You can get either clean JSON files containing only the article content or the original HTML files. This dataset is for researchers, linguists, or students who need a large collection of Indonesian text for natural language processing, linguistic analysis, or machine learning model training.
No commits in the last 6 months.
Use this if you need a free, raw, and unlabelled dataset of Indonesian news articles for academic or research purposes.
Not ideal if you require pre-labelled data for specific tasks like sentiment analysis, or if you need articles published outside the January-August 2018 range.
Stars
42
Forks
4
Language
—
License
CC-BY-SA-4.0
Category
Last pushed
Mar 24, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/feryandi/Dataset-Artikel"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
malaysia-ai/malaya
Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/
IndoNLP/indonlu
The first-ever vast natural language processing benchmark for Indonesian Language. We provide...
louisowen6/NLP_bahasa_resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
kirralabs/indonesian-NLP-resources
data resource untuk NLP bahasa indonesia
wongnai/wongnai-corpus
Collection of Wongnai's datasets