seanghay/khmernormalizer

A missing toolkit for Khmer Natural Language Processing.

/ 100

Emerging

When preparing Khmer text for analysis or display, this tool cleans up common issues like duplicate spaces, broken Unicode characters, emojis, and misspellings. It takes raw, unedited Khmer text and outputs a corrected, standardized version. Anyone working with Khmer language data for things like research, content management, or language education would find this useful.

Available on PyPI.

Use this if you need to reliably clean and standardize raw Khmer text data before using it for any computational task or publication.

Not ideal if you need to perform complex linguistic analysis such as part-of-speech tagging or machine translation, as this tool focuses solely on text normalization.

Khmer-language-processing text-preparation data-cleaning content-moderation digital-humanities

Maintenance 6 / 25

Adoption 5 / 25

Maturity 25 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

VietHoang1512/khmer-nltk

Khmer language processing toolkit

PyThaiNLP/attacut

A Fast and Accurate Neural Thai Word Segmenter

UlugbekSalaev/UzTransliterator

UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language

seanghay/KhmerOCR

A Fast Khmer Optical Character Recognition (KhmerOCR)

seanghay/khmerphonemizer

A Free, Standalone and Open-Source Khmer Grapheme-to-Phonemes.

Explore NLP Tools

All categories Trending NLP directory Insights