wittawatj/jtcc

Java library to tokenize Thai text into a list of TCCs

37
/ 100
Emerging

This tool helps prepare Thai text for natural language processing by breaking it down into 'Thai Character Clusters' (TCCs). You input raw Thai text, either through the command line or from a file, and it outputs a sequence of TCCs, which are inseparable groups of Thai characters. This is mainly for developers building larger Thai NLP systems.

No commits in the last 6 months.

Use this if you are developing a Thai natural language processing application and need a foundational step to segment Thai text into character clusters.

Not ideal if you need a full word segmenter, syllable tokenizer, or a tool that considers grammatical context for text analysis.

Thai-NLP text-tokenization computational-linguistics software-development
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

19

Forks

5

Language

Java

License

GPL-3.0

Last pushed

May 30, 2017

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/wittawatj/jtcc"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.