kmkurn/id-nlp-resource
A list of Indonesian NLP resources.
This is a curated list of publicly available language data for Indonesian, including vast collections of news articles, social media posts, and transcribed speech. It serves as a central hub for anyone needing Indonesian text or audio to train or evaluate language models, analyze sentiment, or build translation systems. Researchers, data scientists, and language technology developers focused on the Indonesian market would find this resource invaluable.
290 stars. No commits in the last 6 months.
Use this if you need pre-existing Indonesian text or speech datasets for developing or evaluating language-related applications and research.
Not ideal if you need a tool to process Indonesian text or speech directly, as this resource only provides the raw data.
Stars
290
Forks
48
Language
—
License
—
Category
Last pushed
Jan 18, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/kmkurn/id-nlp-resource"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
malaysia-ai/malaya
Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/
louisowen6/NLP_bahasa_resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
IndoNLP/indonlu
The first-ever vast natural language processing benchmark for Indonesian Language. We provide...
kirralabs/indonesian-NLP-resources
data resource untuk NLP bahasa indonesia
wongnai/wongnai-corpus
Collection of Wongnai's datasets