koth/kcws
Deep Learning Chinese Word Segment
This project helps break down continuous Chinese text into individual words, a crucial first step for many language processing tasks. It takes raw Chinese sentences and outputs the same sentences with spaces inserted between identified words, optionally including part-of-speech tags. This is ideal for natural language processing engineers, data scientists, or researchers working with Chinese text data.
2,074 stars. No commits in the last 6 months.
Use this if you need to accurately segment Chinese sentences into words for further analysis, search, or machine learning applications.
Not ideal if you are looking for a pre-built, easy-to-integrate API without needing to train or manage machine learning models.
Stars
2,074
Forks
638
Language
C++
License
—
Category
Last pushed
May 18, 2018
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/koth/kcws"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
facebookresearch/stopes
A library for preparing data for machine translation research (monolingual preprocessing,...
Droidtown/ArticutAPI
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到...
rkcosmos/deepcut
A Thai word tokenization library using Deep Neural Network
fukuball/jieba-php
"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation:...
pytorch/text
Models, data loaders and abstractions for language processing, powered by PyTorch