rkcosmos/deepcut
A Thai word tokenization library using Deep Neural Network
This tool helps you break down raw Thai text into individual words, which is essential for accurate analysis of Thai language. You provide a block of Thai text, and it returns a list of separate words. Anyone working with Thai language data, such as linguists, researchers, or data analysts, would find this useful.
427 stars. Used by 1 other package. No commits in the last 6 months. Available on PyPI.
Use this if you need to accurately split Thai sentences and paragraphs into their constituent words for further processing or analysis.
Not ideal if your primary need is for languages other than Thai, as this tool is specifically designed for Thai word segmentation.
Stars
427
Forks
98
Language
Python
License
MIT
Category
Last pushed
Oct 23, 2020
Commits (30d)
0
Dependencies
6
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/rkcosmos/deepcut"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
facebookresearch/stopes
A library for preparing data for machine translation research (monolingual preprocessing,...
Droidtown/ArticutAPI
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到...
fukuball/jieba-php
"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation:...
pytorch/text
Models, data loaders and abstractions for language processing, powered by PyTorch
jiesutd/NCRFpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER,...