Chinese NLP Toolkits NLP Tools

Comprehensive NLP toolkits and frameworks specifically designed for Chinese language processing, including segmentation, POS tagging, NER, sentiment analysis, and classical Chinese support. Does NOT include language-agnostic NLP tools, machine translation systems, or tools focused on non-Chinese languages.

There are 78 chinese nlp toolkits tools tracked. 1 score above 70 (verified tier). The highest-rated is PyThaiNLP/pythainlp at 90/100 with 1,117 stars. 2 of the top 10 are actively maintained.

Get all 78 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=chinese-nlp-toolkits&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 PyThaiNLP/pythainlp

Thai natural language processing in Python

90
Verified
2 hankcs/HanLP

Natural Language Processing for the next decade. Tokenization,...

67
Established
3 jacksonllee/pycantonese

Cantonese Linguistics and NLP

64
Established
4 dongrixinyu/JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package...

62
Established
5 hankcs/pyhanlp

中文分词

60
Established
6 ownthink/Jiagu

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

59
Established
7 go-ego/gse

Go efficient multilingual NLP and text segmentation; support English,...

58
Established
8 baidu/lac

百度NLP:分词,词性标注,命名实体识别,词重要性

58
Established
9 messense/jieba-rs

The Jieba Chinese Word Segmentation Implemented in Rust

57
Established
10 yongzhuo/Macropodus

自然语言处理工具Macropodus,基于Albert+BiLSTM+CRF深度学习网络架构,中文分词,词性标注,命名实体识别,新词发现,关键词,文本摘要...

57
Established
11 SeanLee97/xmnlp

xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首,句子表征及文本相似度计算等功能

57
Established
12 OpenPecha/pybo

🦜 NLP for Tibetan, in Python.

54
Established
13 jiaeyan/Jiayan

甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st...

53
Established
14 lionsoul2014/jcseg

Jcseg is a light weight NLP framework developed with Java. Provide CJK and...

51
Established
15 NLPchina/ansj_seg

ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典

51
Established
16 yaoguangluo/Deta_Parser

快速中文分词分析word segmentation

50
Established
17 qinwf/jiebaR

Chinese text segmentation with R. R语言中文分词 (文档已更新 🎉...

50
Established
18 hankcs/hanlp-lucene-plugin

HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统

50
Established
19 monpa-team/monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

50
Established
20 XiaoMi/MiNLP

XiaoMi Natural Language Processing Toolkits

50
Established
21 hankcs/multi-criteria-cws

Simple Solution for Multi-Criteria Chinese Word Segmentation

49
Emerging
22 smoothnlp/SmoothNLP

专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference

49
Emerging
23 jimichan/mynlp

一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)

49
Emerging
24 KoichiYasuoka/UD-Kanbun

Tokenizer POS-tagger and Dependency-parser for Classical Chinese

48
Emerging
25 houbb/opencc4j

🇨🇳Open Chinese Convert is an opensource project for conversion between...

48
Emerging
26 notAI-tech/deepsegment

A sentence segmenter that actually works!

47
Emerging
27 linonetwo/segmentit

任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment

47
Emerging
28 supercoderhawk/DeepLearning_NLP

基于深度学习的自然语言处理库

47
Emerging
29 hankcs/ID-CNN-CWS

Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese...

47
Emerging
30 kirklin/go-swd

Sensitive Words Detection 一个高性能的敏感词检测和过滤库,基于 Go...

46
Emerging
31 houbb/segment

The jieba-analysis tool for java.(基于结巴分词词库实现的更加灵活优雅易用,高性能的 java 分词实现。支持词性标注。)

45
Emerging
32 houbb/nlp-hanzi-similar

The hanzi similar tool.(汉字相似度计算工具,中文形近字算法。可用于手写汉字识别纠正,文本混淆等。)

44
Emerging
33 houbb/pinyin

The high performance pinyin tool for java.(java 高性能中文转拼音工具。支持同音字。)

44
Emerging
34 suminb/hanja

한글, 한자 라이브러리

42
Emerging
35 KoichiYasuoka/SuPar-Kanbun

Tokenizer POS-tagger and Dependency-parser for Classical Chinese

40
Emerging
36 google/budou

Budou is an automatic organizer tool for beautiful line breaking in CJK...

40
Emerging
37 StarCC0/starcc-py

简繁转换 簡繁轉換 Python implementation of StarCC, the next generation of...

40
Emerging
38 junchaoIU/QCNLP

A Preprocessing & Parsing tool for Chinese Natural Language (一个高效的中文预处理与自然语言处理解析工具)

40
Emerging
39 mxcoras/jieba-next

Use Rust to Speed up jieba 高效、现代的中文分词库

39
Emerging
40 KoichiYasuoka/GuwenCOMBO

Tokenizer POS-tagger and Dependency-parser for Classical Chinese

38
Emerging
41 shibing624/crf-seg

crf-seg:用于生产环境的中文分词处理工具,可自定义语料、可自定义模型、架构清晰,分词效果好。java编写。

38
Emerging
42 cyd622/nlp-jieba

结巴中文分词(PHP 版本):做最好的 PHP 中文分词、中文断词组件

38
Emerging
43 bububa/jiagu

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

37
Emerging
44 notoriouslab/trad-zh-search

trad-zh-search 可單獨搭配主流搜尋引擎,專門給繁體中文使用的繁體中文文本預處理工具 —— CKIP 分詞 + bigram 索引生成,附可選擇的領域字典系統

37
Emerging
45 wittawatj/jtcc

Java library to tokenize Thai text into a list of TCCs

37
Emerging
46 KoichiYasuoka/SuPar-Kanbun-1.3.4

Tokenizer POS-tagger and Dependency-parser for Classical Chinese

34
Emerging
47 sdq/FenciMac

中文分词 Mac版

34
Emerging
48 supercoderhawk/DeepNLP

基于深度学习的自然语言处理库

34
Emerging
49 jason2506/esapp

An unsupervised Chinese word segmentation tool.

32
Emerging
50 zxgineng/deepnlp

小时候练手的nlp项目

32
Emerging
51 dogterbox/thai-word-segmentation

Thai word segmentation using deep learning

31
Emerging
52 PyThaiNLP/Han-solo

🪿 Han-solo: Thai syllable segmenter

29
Experimental
53 jamsinclair/budou-node

Node.js port of Budou, an automatic organizer tool for beautiful line...

29
Experimental
54 limchiahooi/nlp-chinese

This repo contains my Natural Language Processing (NLP) in Chinese project.

28
Experimental
55 hope-data-science/chinese_NLP

中文自然语言处理

28
Experimental
56 bryanchw/Traditional-Chinese-Stopwords-and-Punctuations-Library

Created a Python library specifically for Traditional Chinese stopwords and...

28
Experimental
57 shibing624/pinyin-tokenizer

pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。

27
Experimental
58 cxumol/jieba-wasm-html

Fast Jieba Chinese text segmentation on browser without backend/NPM |...

27
Experimental
59 mathsyouth/awesome-word-segmentation

A curated list of resources dedicated to word segmentation

27
Experimental
60 jsrpy/Chinese-NLP-Jieba

This is an introduction to Chinese words segmentation using Jieba.

27
Experimental
61 wchan757/Cantonese_Word_Segmentation

Dictionary for Cantonese word segmentation

27
Experimental
62 Lapis-Hong/fast-xinci

新词发现 Chinese New Words Finder (c++ library).

26
Experimental
63 gyatso736/-Tibetan-tokenizer-

This Tibetan tokenizer based on Bi-LSTM+CRF methods, it was created with the...

26
Experimental
64 JackHCC/Chinese-Tokenization

利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word...

25
Experimental
65 NoHeartPen/Kanji2Hanzi

This Project is used to convert Japanese Kanji to Simplifed Chinese character.

23
Experimental
66 wittawatj/ctwt

Classifier-based Thai Word Tokenizer

20
Experimental
67 Jyutt/jieba-hs

Jieba中文分詞算法Haskell版本 Haskell Implementation of Jieba Chinese Segmentation Algorithm

20
Experimental
68 yihong-chen/chinese-word-segmentation

Simple chinese word segmentation with experiments on the PKU datatset

20
Experimental
69 sinostudy/pinyin

Convert between different representations of Hànyǔ Pīnyīn.

20
Experimental
70 Ancastal/HSK-Character-Profiler

HSK Character Profiler is a Python tool that analyzes Chinese character...

17
Experimental
71 bmwj/Tibetan_information_processing

藏文信息处理工具集(Tibetan_Information_Processing_Toolkit),其功能包含:生成完整的藏文字符集,智能识别藏文字符构件...

17
Experimental
72 StarCC0/starcc0.github.io

简繁转换 簡繁轉換 StarCC is the next generation of Simplified-Traditional Chinese...

17
Experimental
73 NicoACloutier/Hanzi.jl

A Julia library to romanize Hanzi.

15
Experimental
74 AlanMC123/NLP-Test1

Homework: (1) Chinses word segmentation of Jieba, SnowNLP and THULAC; (2)...

12
Experimental
75 fastcws/tagged-wiki2019zh

基于4-tag标注好的2019中文维基语料库,使用hanlp进行标注

12
Experimental
76 CeliaChien/song-poem

宋词自动生成系统和中文分词系统

11
Experimental
77 kcxain/judou

“句读”中文分词器

11
Experimental
78 AlienKevin/dips

Efficient Multi-Criteria Cantonese Word Segmentation

11
Experimental

Comparisons in this category