seanghay/khmernormalizer

A missing toolkit for Khmer Natural Language Processing.

43
/ 100
Emerging

When preparing Khmer text for analysis or display, this tool cleans up common issues like duplicate spaces, broken Unicode characters, emojis, and misspellings. It takes raw, unedited Khmer text and outputs a corrected, standardized version. Anyone working with Khmer language data for things like research, content management, or language education would find this useful.

Available on PyPI.

Use this if you need to reliably clean and standardize raw Khmer text data before using it for any computational task or publication.

Not ideal if you need to perform complex linguistic analysis such as part-of-speech tagging or machine translation, as this tool focuses solely on text normalization.

Khmer-language-processing text-preparation data-cleaning content-moderation digital-humanities
Maintenance 6 / 25
Adoption 5 / 25
Maturity 25 / 25
Community 7 / 25

How are scores calculated?

Stars

11

Forks

1

Language

Python

License

MIT

Last pushed

Nov 18, 2025

Commits (30d)

0

Dependencies

3

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/seanghay/khmernormalizer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.