Indonesian NLP Resources NLP Tools

Curated collections, datasets, and resource lists specifically for Indonesian/Malay language NLP. Includes benchmark datasets, resource compilations, and toolkit libraries for Bahasa Indonesia. Does NOT include general NLP courses, application-specific projects (like sentiment analysis tools), or non-Indonesian language resources.

There are 26 indonesian nlp resources tools tracked. 1 score above 70 (verified tier). The highest-rated is malaysia-ai/malaya at 71/100 with 521 stars. 1 of the top 10 are actively maintained.

Get all 26 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=indonesian-nlp-resources&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 malaysia-ai/malaya

Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/

71
Verified
2 louisowen6/NLP_bahasa_resources

A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia

51
Established
3 IndoNLP/indonlu

The first-ever vast natural language processing benchmark for Indonesian...

51
Established
4 kirralabs/indonesian-NLP-resources

data resource untuk NLP bahasa indonesia

48
Emerging
5 wongnai/wongnai-corpus

Collection of Wongnai's datasets

45
Emerging
6 rizalespe/Dataset-Sentimen-Analisis-Bahasa-Indonesia

Repositori ini merupakan kumpulan dataset terkait analisis sentimen...

42
Emerging
7 kmkurn/id-pos-tagging

Indonesian part-of-speech (POS) tagging

39
Emerging
8 IndoNLP/nusa-catalogue

Dataset Catalogue Homepage for Indonesian Languages

38
Emerging
9 kmkurn/id-nlp-resource

A list of Indonesian NLP resources.

38
Emerging
10 ariya/tebakmasa

Infer the date and time from the general description in Bahasa Indonesia

37
Emerging
11 IndoNLP/nusax

High-quality parallel resource on sentiment analysis for 10 low-resource...

37
Emerging
12 yohanesgultom/nlp-experiments

Indonesian NLP experiments

35
Emerging
13 Wikidepia/indonesian_datasets

NLP Datasets for Indonesian

34
Emerging
14 feryandi/Dataset-Artikel

Repository ini berisikan kumpulan data mentah berupa artikel dari berbagai...

34
Emerging
15 Hyuto/indo-nlp

Library python sederhana tanpa dependency tambahan yang bertujuan untuk...

33
Emerging
16 datascienceid/nlp-resources

A curated list of natural language processing courses, video lectures,...

32
Emerging
17 LazarusNLP/indonesian-sentence-embeddings

Embedding Representation for Indonesian Sentences!

32
Emerging
18 ailabtelkom/id-NLP-resources

Kumpulan resource untuk pemrosesan bahasa alami Bahasa Indonesia. Segala...

32
Emerging
19 danieldanuega/spacyndo

Dependency Parser and NER model for Bahasa Indonesia Spacy 2.1

27
Experimental
20 rrayhka/indonesian-ner-spacy

Fine-tuning SpaCy for Indonesian Named Entity Recognition (NER) with custom dataset.

26
Experimental
21 irfandythalib/python-indonesia-stopwords-remover

This code is used to remove stopwords using Tala stopwords library for...

24
Experimental
22 nandanovenia/resource-nlp-indonesia

Natural Language Processing Resource for Bahasa Indonesia

22
Experimental
23 novay/frasa

Frasa is a collection of modules which provides various functions for...

22
Experimental
24 matbahasa/MALINDO_BLiMP

MALINDO BLiMP (Malay/Indonesian Benchmark of Linguistic Minimal Pairs)

14
Experimental
25 HantuGur/NUSANTAARA-LEARN-LANGUAGE

🌿 NusaLingua adalah platform web edukasi bahasa daerah Indonesia berbasis...

11
Experimental
26 Cortana-Coders/NutriSense

NutriSense: Platform Pengukuran Gizi dengan Pemrosesan Bahasa Alami

11
Experimental