Sovichea/khmer_segmenter

A zero-dependency, high-performance Khmer word segmenter using the Viterbi algorithm. Optimized for dictionary accuracy, ultra-low memory footprint, and edge deployment.

37
/ 100
Emerging

This project helps anyone working with Khmer language text by accurately breaking down sentences into individual words. You input a raw Khmer text, and it outputs the text segmented into its constituent words, highlighting any unknown terms. This tool is ideal for linguists, content creators, or data analysts who need precise word boundaries for further analysis or application development.

Use this if you need a reliable, deterministic, and fast way to segment Khmer text into words without relying on inconsistent manual annotations or complex machine learning setups.

Not ideal if your primary need is for a system that learns word boundaries from highly diverse, uncurated, and inconsistent text data, as this tool prioritizes dictionary accuracy.

Khmer-language-processing text-analysis linguistics content-localization data-preparation
No Package No Dependents
Maintenance 6 / 25
Adoption 7 / 25
Maturity 13 / 25
Community 11 / 25

How are scores calculated?

Stars

34

Forks

4

Language

Python

License

MIT

Last pushed

Jan 08, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Sovichea/khmer_segmenter"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.