Wikidepia/indonesian_datasets
NLP Datasets for Indonesian
If you work with Indonesian text or speech and need large amounts of data to analyze or build tools, this collection provides ready-to-use resources. It includes a wide array of content, from news articles and social media posts to dictionaries and translated academic datasets. This is ideal for researchers, linguists, or data analysts focusing on Indonesian language processing.
126 stars. No commits in the last 6 months.
Use this if you need a comprehensive collection of pre-processed Indonesian text, speech, or parallel translation data for research, model training, or linguistic analysis.
Not ideal if you are looking for real-time data streams or specific niche datasets not covered by general text, speech, or translated multimodal content.
Stars
126
Forks
17
Language
Python
License
—
Category
Last pushed
Feb 11, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Wikidepia/indonesian_datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
malaysia-ai/malaya
Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/
louisowen6/NLP_bahasa_resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
IndoNLP/indonlu
The first-ever vast natural language processing benchmark for Indonesian Language. We provide...
kirralabs/indonesian-NLP-resources
data resource untuk NLP bahasa indonesia
wongnai/wongnai-corpus
Collection of Wongnai's datasets