seanghay/khmernormalizer
A missing toolkit for Khmer Natural Language Processing.
When preparing Khmer text for analysis or display, this tool cleans up common issues like duplicate spaces, broken Unicode characters, emojis, and misspellings. It takes raw, unedited Khmer text and outputs a corrected, standardized version. Anyone working with Khmer language data for things like research, content management, or language education would find this useful.
Available on PyPI.
Use this if you need to reliably clean and standardize raw Khmer text data before using it for any computational task or publication.
Not ideal if you need to perform complex linguistic analysis such as part-of-speech tagging or machine translation, as this tool focuses solely on text normalization.
Stars
11
Forks
1
Language
Python
License
MIT
Category
Last pushed
Nov 18, 2025
Commits (30d)
0
Dependencies
3
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/seanghay/khmernormalizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
VietHoang1512/khmer-nltk
Khmer language processing toolkit
PyThaiNLP/attacut
A Fast and Accurate Neural Thai Word Segmenter
UlugbekSalaev/UzTransliterator
UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language
seanghay/KhmerOCR
A Fast Khmer Optical Character Recognition (KhmerOCR)
seanghay/khmerphonemizer
A Free, Standalone and Open-Source Khmer Grapheme-to-Phonemes.