Southeast Asian NLP Tools

NLP tools and resources specifically for Southeast Asian languages (Khmer, Burmese, Myanmar, Thai, Rakhine). Includes text segmentation, transliteration, OCR, and language-specific preprocessing. Does NOT include general multilingual NLP tools, datasets for non-Southeast Asian languages, or language-agnostic NLP frameworks.

There are 45 southeast asian nlp tools tracked. 3 score above 50 (established tier). The highest-rated is VietHoang1512/khmer-nltk at 57/100 with 81 stars.

Get all 45 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=southeast-asian-nlp-tools&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 VietHoang1512/khmer-nltk

Khmer language processing toolkit

57
Established
2 PyThaiNLP/attacut

A Fast and Accurate Neural Thai Word Segmenter

54
Established
3 UlugbekSalaev/UzTransliterator

UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language

50
Established
4 seanghay/KhmerOCR

A Fast Khmer Optical Character Recognition (KhmerOCR)

47
Emerging
5 seanghay/khmerphonemizer

A Free, Standalone and Open-Source Khmer Grapheme-to-Phonemes.

45
Emerging
6 ionite34/Aquila-Resolve

Augmented Recurrent Neural Grapheme-to-Phoneme conversion with Inflectional...

44
Emerging
7 seanghay/khmernormalizer

A missing toolkit for Khmer Natural Language Processing.

43
Emerging
8 AI4Bharat/IndicNLP-Transliteration

Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with...

42
Emerging
9 koomri/text-segmentation

Implementation of the paper: Text Segmentation as a Supervised Learning Task

40
Emerging
10 mdoumbouya/detransliterator

detransliteration library and tools

40
Emerging
11 eimg/myanmar-text-breaker

Syllable and word, breaker/boundary-segmentation for Myanmar text in JavaScript

40
Emerging
12 Sovichea/khmer_segmenter

A zero-dependency, high-performance Khmer word segmenter using the Viterbi...

37
Emerging
13 ionite34/h2p-parser

Heteronym to Phoneme Parser

37
Emerging
14 YerevaNN/translit-rnn

Automatic transliteration with LSTM

36
Emerging
15 Khmer-NLP/khmer-nlp

Khmer Natural Language Processing (KHNLP)

34
Emerging
16 Koziev/StressModel

Neural model for prediction of stress position in Russian words

34
Emerging
17 MinSiThu/Rakhine-Proverbs-Dataset

Proverbs in Rakhine/Arakan Language

34
Emerging
18 josephjojoe/syllabification

GRU-based neural network with Inception modules and an optional Linear Chain...

33
Emerging
19 swanhtet1992/ReSegment

Burmese (Myanmar) syllable level segmentation with regex.

32
Emerging
20 khmerlang/elasticsearch-analysis-khmerlang

Khmer Analysis Plugin for Elasticsearch

32
Emerging
21 SaPhyoThuHtet/myanmar-nlp-tool

Natural Language Processing Tool

31
Emerging
22 Koziev/transcriber

Model to convert text to phonetic transcription and vice versa

31
Emerging
23 chanmratekoko/Awesome-Myanmar-Wordlists-Dictionary-Collection

Myanmar (Burmese) Wordlists Dictionary Collection for word segmentation,...

30
Emerging
24 netra-ai-lab/Khmer-OCR-CNN-Transformer

A Squeeze-and-Excitation Transformer Network for Khmer Optical Character Recognition

29
Experimental
25 sagorbrur/bntranslit

Bangla Transliteration Package

29
Experimental
26 NDarayut/english-khmer-transliteration

An English–Khmer transliteration system built on an Attention-Based...

28
Experimental
27 sagorbrur/itranslit

transliteration for indic language

28
Experimental
28 seanghay/khmer-neural-segmenter

Khmer Neural Segmenter

26
Experimental
29 alvations/myth

Myanmar and Thai Language Resources

26
Experimental
30 Michael95-m/myanmar_names

Burmese name conversion with rule-based method (Burmese to English and...

23
Experimental
31 thomas-chauvet/names_transliteration

Neural Machine Translation (NMT) applied to transliterate names in arabic...

21
Experimental
32 dmitry-rvn/ru-svo-triplets

Subject-verb-object triplets extraction for russian language.

21
Experimental
33 Socret360/joint-khmer-word-segmentation-and-pos-tagging

A Keras implementation of a deep learning network to simultaneously perform...

20
Experimental
34 ye-kyaw-thu/MSL4Emergency

Myanmar Sign Language Corpus for Emergency Domain

20
Experimental
35 shayneobrien/text-segmentation

Neural and nonneural text segmentation methods.

20
Experimental
36 suralmasha/RuTranscript

Russian phonetical transcription

19
Experimental
37 SaPhyoThuHtet/myanmar-part-of-speech-tagging-based-on-machine-translation

POS Tagging Based on Machine Translation (UTYCC Class Final Project)

18
Experimental
38 eemberda/Cebuano-Syllable-Decoder

Accepts a Cebuano word and breaks it down into syllables

17
Experimental
39 ThuraAung1601/myTypo

myTypo : Typographic Error Simulator for Myanmar Language

17
Experimental
40 papamusa/Three-word-sentences

🔤 Master three-word sentences for clear English communication through simple...

14
Experimental
41 MinSiThu/Myanmar-Agriculture-1K

Agriculture Dataset in Burmese Language

12
Experimental
42 Anas1108/Transliteration-RomantoUrdu-And-ViceVersa

This project aims to develop a program that can perform transliteration...

12
Experimental
43 juletx/writing-systems

Comparing Writing Systems with Multilingual Grapheme-to-Phoneme and...

11
Experimental
44 Michael95-m/mya-sent-break

Sentence segmentation for burmese language by rule-based method

10
Experimental
45 ljvmiranda921/ud-tagalog-spacy

Training a POS Tagger and Dependency Parser for a Low-Resource Language (Tagalog)

10
Experimental

Comparisons in this category