PyThaiNLP/attacut
A Fast and Accurate Neural Thai Word Segmenter
When working with Thai text, a significant challenge is breaking sentences into individual words because Thai doesn't use spaces between words. This tool takes raw Thai text as input and produces text where words are correctly separated, making it ready for analysis. It's ideal for linguists, data analysts, or anyone processing Thai language data.
Used by 1 other package. No commits in the last 6 months. Available on PyPI.
Use this if you need to quickly and accurately segment Thai text into individual words for natural language processing tasks.
Not ideal if your primary concern is absolute state-of-the-art accuracy at the expense of processing speed.
Stars
94
Forks
18
Language
Python
License
MIT
Category
Last pushed
Jan 14, 2025
Commits (30d)
0
Dependencies
8
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/PyThaiNLP/attacut"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
VietHoang1512/khmer-nltk
Khmer language processing toolkit
UlugbekSalaev/UzTransliterator
UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language
seanghay/KhmerOCR
A Fast Khmer Optical Character Recognition (KhmerOCR)
seanghay/khmerphonemizer
A Free, Standalone and Open-Source Khmer Grapheme-to-Phonemes.
ionite34/Aquila-Resolve
Augmented Recurrent Neural Grapheme-to-Phoneme conversion with Inflectional Orthography.