JackHCC/Chinese-Tokenization

利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre training methods (Bert, etc.)】

25
/ 100
Experimental

This project helps natural language processing (NLP) practitioners accurately break down continuous Chinese text into individual words, a critical first step for many text analysis tasks. It takes raw Chinese sentences or documents as input and outputs segmented text, ready for further processing like sentiment analysis or information extraction. NLP developers and researchers working with Chinese text data would find this useful for building and evaluating segmentation models.

No commits in the last 6 months.

Use this if you need to implement or compare various Chinese word segmentation algorithms, from traditional to advanced deep learning methods, for your NLP applications or research.

Not ideal if you're looking for a simple, pre-built API or tool for immediate Chinese text segmentation without needing to understand or experiment with the underlying models.

Chinese-NLP text-segmentation language-processing computational-linguistics text-mining
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 10 / 25

How are scores calculated?

Stars

38

Forks

4

Language

Python

License

Last pushed

Jun 15, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/JackHCC/Chinese-Tokenization"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.