clipperhouse/uax29

A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split graphemes, words, sentences.

44
/ 100
Emerging

This tool helps developers accurately break down text into its fundamental units like graphemes, words, and sentences, following the Unicode standard. It takes in raw text and outputs a stream of these text segments, which are crucial for natural language processing tasks. Developers building multilingual search engines, text analysis tools, or language understanding models would find this valuable.

101 stars.

Use this if you need a reliable, multilingual way to segment text into words, sentences, or individual characters for tasks like building an inverted index or performing text analysis.

Not ideal if your application doesn't require precise, Unicode-conformant text segmentation and a simple split by spaces is sufficient.

text-segmentation natural-language-processing full-text-search text-analysis multilingual-text
No Package No Dependents
Maintenance 10 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 9 / 25

How are scores calculated?

Stars

101

Forks

6

Language

Go

License

MIT

Category

go-nlp-libraries

Last pushed

Feb 16, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/clipperhouse/uax29"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.