hankcs/ID-CNN-CWS
Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"
This project helps researchers and developers working with Chinese text to accurately separate continuous streams of characters into individual words. It takes raw Chinese text corpora as input and outputs a trained model capable of performing word segmentation. This is ideal for natural language processing engineers or computational linguists developing applications that require precise understanding of Chinese text.
133 stars. No commits in the last 6 months.
Use this if you need to build or evaluate models for Chinese word segmentation, especially if you are interested in using or comparing iterated dilated convolutional neural networks or Bi-LSTM architectures.
Not ideal if you are a general user looking for a pre-built, ready-to-use Chinese word segmentation tool without needing to train or benchmark models.
Stars
133
Forks
37
Language
Python
License
GPL-3.0
Category
Last pushed
Apr 15, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/hankcs/ID-CNN-CWS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PyThaiNLP/pythainlp
Thai natural language processing in Python
hankcs/HanLP
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named...
jacksonllee/pycantonese
Cantonese Linguistics and NLP
dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
hankcs/pyhanlp
中文分词