Droidtown/ArticutAPI
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
This tool helps you break down Chinese text into individual words and identify their grammatical roles, like nouns, verbs, or adjectives. You input raw Chinese sentences or larger documents, and it outputs segmented text with detailed part-of-speech tags. This is ideal for anyone working with Chinese language data, such as researchers, linguists, or content analysts, who need to understand the structure and meaning within text.
414 stars. Available on PyPI.
Use this if you need to accurately segment Chinese text and assign semantic part-of-speech tags for natural language processing tasks, without relying on machine learning models.
Not ideal if your primary need is for advanced encyclopedic knowledge processing, as this tool focuses on linguistic structure rather than general factual knowledge.
Stars
414
Forks
38
Language
Python
License
MIT
Category
Last pushed
Feb 10, 2026
Commits (30d)
0
Dependencies
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Droidtown/ArticutAPI"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
facebookresearch/stopes
A library for preparing data for machine translation research (monolingual preprocessing,...
rkcosmos/deepcut
A Thai word tokenization library using Deep Neural Network
fukuball/jieba-php
"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation:...
pytorch/text
Models, data loaders and abstractions for language processing, powered by PyTorch
jiesutd/NCRFpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER,...