JackHCC/Chinese-Tokenization

利用传统方法（N-gram，HMM等）、神经网络方法（CNN，LSTM等）和预训练方法（Bert等）的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre training methods (Bert, etc.)】

/ 100

Experimental

This project helps natural language processing (NLP) practitioners accurately break down continuous Chinese text into individual words, a critical first step for many text analysis tasks. It takes raw Chinese sentences or documents as input and outputs segmented text, ready for further processing like sentiment analysis or information extraction. NLP developers and researchers working with Chinese text data would find this useful for building and evaluating segmentation models.

No commits in the last 6 months.

Use this if you need to implement or compare various Chinese word segmentation algorithms, from traditional to advanced deep learning methods, for your NLP applications or research.

Not ideal if you're looking for a simple, pre-built API or tool for immediate Chinese text segmentation without needing to understand or experiment with the underlying models.

Chinese-NLP text-segmentation language-processing computational-linguistics text-mining

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

PyThaiNLP/pythainlp

Thai natural language processing in Python

hankcs/HanLP

Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named...

jacksonllee/pycantonese

Cantonese Linguistics and NLP

dongrixinyu/JioNLP

中文 NLP 预处理、解析工具包，准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

hankcs/pyhanlp

中文分词

Explore NLP Tools

All categories Trending NLP directory Insights