jacksonllee/pycantonese
Cantonese Linguistics and NLP
PyCantonese helps linguists, researchers, or anyone interested in the Cantonese language to analyze Cantonese text. You can input raw Cantonese text or corpus data and it provides tools for word segmentation, part-of-speech tagging, and converting between different romanization systems like Jyutping. This is ideal for academics and commercial organizations studying or processing Cantonese.
400 stars. Available on PyPI.
Use this if you need to programmatically analyze Cantonese text for linguistic research, build language-learning tools, or process Cantonese data for natural language understanding applications.
Not ideal if you need a pre-built application for end-users rather than a programmatic library for text analysis.
Stars
400
Forks
43
Language
Python
License
MIT
Category
Last pushed
Mar 15, 2026
Commits (30d)
0
Dependencies
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/jacksonllee/pycantonese"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
PyThaiNLP/pythainlp
Thai natural language processing in Python
hankcs/HanLP
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named...
dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
hankcs/pyhanlp
中文分词
ownthink/Jiagu
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类