Wikidepia/indonesian_datasets

NLP Datasets for Indonesian

34
/ 100
Emerging

If you work with Indonesian text or speech and need large amounts of data to analyze or build tools, this collection provides ready-to-use resources. It includes a wide array of content, from news articles and social media posts to dictionaries and translated academic datasets. This is ideal for researchers, linguists, or data analysts focusing on Indonesian language processing.

126 stars. No commits in the last 6 months.

Use this if you need a comprehensive collection of pre-processed Indonesian text, speech, or parallel translation data for research, model training, or linguistic analysis.

Not ideal if you are looking for real-time data streams or specific niche datasets not covered by general text, speech, or translated multimodal content.

Indonesian-language-research text-analysis speech-recognition content-localization linguistic-studies
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 16 / 25

How are scores calculated?

Stars

126

Forks

17

Language

Python

License

Last pushed

Feb 11, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Wikidepia/indonesian_datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.