koth/kcws

Deep Learning Chinese Word Segment

/ 100

Emerging

This project helps break down continuous Chinese text into individual words, a crucial first step for many language processing tasks. It takes raw Chinese sentences and outputs the same sentences with spaces inserted between identified words, optionally including part-of-speech tags. This is ideal for natural language processing engineers, data scientists, or researchers working with Chinese text data.

2,074 stars. No commits in the last 6 months.

Use this if you need to accurately segment Chinese sentences into words for further analysis, search, or machine learning applications.

Not ideal if you are looking for a pre-built, easy-to-integrate API without needing to train or manage machine learning models.

Chinese language processing text segmentation natural language understanding text mining

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 25 / 25

How are scores calculated?

Stars

2,074

Forks

638

Language

C++

License

—

Higher-rated alternatives

facebookresearch/stopes

A library for preparing data for machine translation research (monolingual preprocessing,...

Droidtown/ArticutAPI

API of Articut 中文斷詞 (兼具語意詞性標記)：「斷詞」又稱「分詞」，是中文資訊處理的基礎。Articut 不用機器學習，不需資料模型，只用現代白話中文語法規則，即能達到...

rkcosmos/deepcut

A Thai word tokenization library using Deep Neural Network

fukuball/jieba-php

"結巴"中文分詞：做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation:...

pytorch/text

Models, data loaders and abstractions for language processing, powered by PyTorch

Explore NLP Tools

All categories Trending NLP directory Insights