Chinese NLP Toolkits NLP Tools
Comprehensive NLP toolkits and frameworks specifically designed for Chinese language processing, including segmentation, POS tagging, NER, sentiment analysis, and classical Chinese support. Does NOT include language-agnostic NLP tools, machine translation systems, or tools focused on non-Chinese languages.
There are 78 chinese nlp toolkits tools tracked. 1 score above 70 (verified tier). The highest-rated is PyThaiNLP/pythainlp at 90/100 with 1,117 stars. 2 of the top 10 are actively maintained.
Get all 78 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=chinese-nlp-toolkits&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
PyThaiNLP/pythainlp
Thai natural language processing in Python |
|
Verified |
| 2 |
hankcs/HanLP
Natural Language Processing for the next decade. Tokenization,... |
|
Established |
| 3 |
jacksonllee/pycantonese
Cantonese Linguistics and NLP |
|
Established |
| 4 |
dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package... |
|
Established |
| 5 |
hankcs/pyhanlp
中文分词 |
|
Established |
| 6 |
ownthink/Jiagu
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类 |
|
Established |
| 7 |
go-ego/gse
Go efficient multilingual NLP and text segmentation; support English,... |
|
Established |
| 8 |
baidu/lac
百度NLP:分词,词性标注,命名实体识别,词重要性 |
|
Established |
| 9 |
messense/jieba-rs
The Jieba Chinese Word Segmentation Implemented in Rust |
|
Established |
| 10 |
yongzhuo/Macropodus
自然语言处理工具Macropodus,基于Albert+BiLSTM+CRF深度学习网络架构,中文分词,词性标注,命名实体识别,新词发现,关键词,文本摘要... |
|
Established |
| 11 |
SeanLee97/xmnlp
xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首,句子表征及文本相似度计算等功能 |
|
Established |
| 12 |
OpenPecha/pybo
🦜 NLP for Tibetan, in Python. |
|
Established |
| 13 |
jiaeyan/Jiayan
甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st... |
|
Established |
| 14 |
lionsoul2014/jcseg
Jcseg is a light weight NLP framework developed with Java. Provide CJK and... |
|
Established |
| 15 |
NLPchina/ansj_seg
ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典 |
|
Established |
| 16 |
yaoguangluo/Deta_Parser
快速中文分词分析word segmentation |
|
Established |
| 17 |
qinwf/jiebaR
Chinese text segmentation with R. R语言中文分词 (文档已更新 🎉... |
|
Established |
| 18 |
hankcs/hanlp-lucene-plugin
HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统 |
|
Established |
| 19 |
monpa-team/monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型 |
|
Established |
| 20 |
XiaoMi/MiNLP
XiaoMi Natural Language Processing Toolkits |
|
Established |
| 21 |
hankcs/multi-criteria-cws
Simple Solution for Multi-Criteria Chinese Word Segmentation |
|
Emerging |
| 22 |
smoothnlp/SmoothNLP
专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference |
|
Emerging |
| 23 |
jimichan/mynlp
一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典) |
|
Emerging |
| 24 |
KoichiYasuoka/UD-Kanbun
Tokenizer POS-tagger and Dependency-parser for Classical Chinese |
|
Emerging |
| 25 |
houbb/opencc4j
🇨🇳Open Chinese Convert is an opensource project for conversion between... |
|
Emerging |
| 26 |
notAI-tech/deepsegment
A sentence segmenter that actually works! |
|
Emerging |
| 27 |
linonetwo/segmentit
任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment |
|
Emerging |
| 28 |
supercoderhawk/DeepLearning_NLP
基于深度学习的自然语言处理库 |
|
Emerging |
| 29 |
hankcs/ID-CNN-CWS
Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese... |
|
Emerging |
| 30 |
kirklin/go-swd
Sensitive Words Detection 一个高性能的敏感词检测和过滤库,基于 Go... |
|
Emerging |
| 31 |
houbb/segment
The jieba-analysis tool for java.(基于结巴分词词库实现的更加灵活优雅易用,高性能的 java 分词实现。支持词性标注。) |
|
Emerging |
| 32 |
houbb/nlp-hanzi-similar
The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。) |
|
Emerging |
| 33 |
houbb/pinyin
The high performance pinyin tool for java.(java 高性能中文转拼音工具。支持同音字。) |
|
Emerging |
| 34 |
suminb/hanja
한글, 한자 라이브러리 |
|
Emerging |
| 35 |
KoichiYasuoka/SuPar-Kanbun
Tokenizer POS-tagger and Dependency-parser for Classical Chinese |
|
Emerging |
| 36 |
google/budou
Budou is an automatic organizer tool for beautiful line breaking in CJK... |
|
Emerging |
| 37 |
StarCC0/starcc-py
简繁转换 簡繁轉換 Python implementation of StarCC, the next generation of... |
|
Emerging |
| 38 |
junchaoIU/QCNLP
A Preprocessing & Parsing tool for Chinese Natural Language (一个高效的中文预处理与自然语言处理解析工具) |
|
Emerging |
| 39 |
mxcoras/jieba-next
Use Rust to Speed up jieba 高效、现代的中文分词库 |
|
Emerging |
| 40 |
KoichiYasuoka/GuwenCOMBO
Tokenizer POS-tagger and Dependency-parser for Classical Chinese |
|
Emerging |
| 41 |
shibing624/crf-seg
crf-seg:用于生产环境的中文分词处理工具,可自定义语料、可自定义模型、架构清晰,分词效果好。java编写。 |
|
Emerging |
| 42 |
cyd622/nlp-jieba
结巴中文分词(PHP 版本):做最好的 PHP 中文分词、中文断词组件 |
|
Emerging |
| 43 |
bububa/jiagu
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类 |
|
Emerging |
| 44 |
notoriouslab/trad-zh-search
trad-zh-search 可單獨搭配主流搜尋引擎,專門給繁體中文使用的繁體中文文本預處理工具 —— CKIP 分詞 + bigram 索引生成,附可選擇的領域字典系統 |
|
Emerging |
| 45 |
wittawatj/jtcc
Java library to tokenize Thai text into a list of TCCs |
|
Emerging |
| 46 |
KoichiYasuoka/SuPar-Kanbun-1.3.4
Tokenizer POS-tagger and Dependency-parser for Classical Chinese |
|
Emerging |
| 47 |
sdq/FenciMac
中文分词 Mac版 |
|
Emerging |
| 48 |
supercoderhawk/DeepNLP
基于深度学习的自然语言处理库 |
|
Emerging |
| 49 |
jason2506/esapp
An unsupervised Chinese word segmentation tool. |
|
Emerging |
| 50 |
zxgineng/deepnlp
小时候练手的nlp项目 |
|
Emerging |
| 51 |
dogterbox/thai-word-segmentation
Thai word segmentation using deep learning |
|
Emerging |
| 52 |
PyThaiNLP/Han-solo
🪿 Han-solo: Thai syllable segmenter |
|
Experimental |
| 53 |
jamsinclair/budou-node
Node.js port of Budou, an automatic organizer tool for beautiful line... |
|
Experimental |
| 54 |
limchiahooi/nlp-chinese
This repo contains my Natural Language Processing (NLP) in Chinese project. |
|
Experimental |
| 55 |
hope-data-science/chinese_NLP
中文自然语言处理 |
|
Experimental |
| 56 |
bryanchw/Traditional-Chinese-Stopwords-and-Punctuations-Library
Created a Python library specifically for Traditional Chinese stopwords and... |
|
Experimental |
| 57 |
shibing624/pinyin-tokenizer
pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。 |
|
Experimental |
| 58 |
cxumol/jieba-wasm-html
Fast Jieba Chinese text segmentation on browser without backend/NPM |... |
|
Experimental |
| 59 |
mathsyouth/awesome-word-segmentation
A curated list of resources dedicated to word segmentation |
|
Experimental |
| 60 |
jsrpy/Chinese-NLP-Jieba
This is an introduction to Chinese words segmentation using Jieba. |
|
Experimental |
| 61 |
wchan757/Cantonese_Word_Segmentation
Dictionary for Cantonese word segmentation |
|
Experimental |
| 62 |
Lapis-Hong/fast-xinci
新词发现 Chinese New Words Finder (c++ library). |
|
Experimental |
| 63 |
gyatso736/-Tibetan-tokenizer-
This Tibetan tokenizer based on Bi-LSTM+CRF methods, it was created with the... |
|
Experimental |
| 64 |
JackHCC/Chinese-Tokenization
利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word... |
|
Experimental |
| 65 |
NoHeartPen/Kanji2Hanzi
This Project is used to convert Japanese Kanji to Simplifed Chinese character. |
|
Experimental |
| 66 |
wittawatj/ctwt
Classifier-based Thai Word Tokenizer |
|
Experimental |
| 67 |
Jyutt/jieba-hs
Jieba中文分詞算法Haskell版本 Haskell Implementation of Jieba Chinese Segmentation Algorithm |
|
Experimental |
| 68 |
yihong-chen/chinese-word-segmentation
Simple chinese word segmentation with experiments on the PKU datatset |
|
Experimental |
| 69 |
sinostudy/pinyin
Convert between different representations of Hànyǔ Pīnyīn. |
|
Experimental |
| 70 |
Ancastal/HSK-Character-Profiler
HSK Character Profiler is a Python tool that analyzes Chinese character... |
|
Experimental |
| 71 |
bmwj/Tibetan_information_processing
藏文信息处理工具集(Tibetan_Information_Processing_Toolkit),其功能包含:生成完整的藏文字符集,智能识别藏文字符构件... |
|
Experimental |
| 72 |
StarCC0/starcc0.github.io
简繁转换 簡繁轉換 StarCC is the next generation of Simplified-Traditional Chinese... |
|
Experimental |
| 73 |
NicoACloutier/Hanzi.jl
A Julia library to romanize Hanzi. |
|
Experimental |
| 74 |
AlanMC123/NLP-Test1
Homework: (1) Chinses word segmentation of Jieba, SnowNLP and THULAC; (2)... |
|
Experimental |
| 75 |
fastcws/tagged-wiki2019zh
基于4-tag标注好的2019中文维基语料库,使用hanlp进行标注 |
|
Experimental |
| 76 |
CeliaChien/song-poem
宋词自动生成系统和中文分词系统 |
|
Experimental |
| 77 |
kcxain/judou
“句读”中文分词器 |
|
Experimental |
| 78 |
AlienKevin/dips
Efficient Multi-Criteria Cantonese Word Segmentation |
|
Experimental |