yihong-chen/chinese-word-segmentation
Simple chinese word segmentation with experiments on the PKU datatset
This tool helps linguists, data scientists, and researchers accurately break down Chinese text into individual words. You input raw Chinese sentences or documents, and it outputs the text with clear word boundaries, which is crucial for further analysis like natural language processing or text mining. It's designed for anyone needing to pre-process Chinese text for computational tasks.
No commits in the last 6 months.
Use this if you need to reliably segment Chinese text into words for linguistic analysis, search indexing, or other text processing applications.
Not ideal if you require extremely high-performance real-time segmentation or are working with highly specialized jargon that might not be covered by standard models.
Stars
8
Forks
1
Language
Jupyter Notebook
License
—
Category
Last pushed
Apr 18, 2018
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/yihong-chen/chinese-word-segmentation"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PyThaiNLP/pythainlp
Thai natural language processing in Python
hankcs/HanLP
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named...
jacksonllee/pycantonese
Cantonese Linguistics and NLP
dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
hankcs/pyhanlp
中文分词