secsilm/zi-dataset
汉字数据集,包括汉字的相关信息,例如笔画数、部首、拼音、英文释义/同义词等。
This dataset provides comprehensive information for approximately 20,000 Chinese characters. It takes a Chinese character as input and returns details like its stroke count, Mandarin and Cantonese pinyin, English definitions, radical, and various encoding schemes. It's ideal for linguists, educators, or anyone involved in Chinese language research or teaching.
129 stars. No commits in the last 6 months.
Use this if you need detailed, structured information about Chinese characters for linguistic analysis, educational material creation, or language learning applications.
Not ideal if you need a tool for real-time Chinese character recognition from images or handwriting, or if you require stroke order data.
Stars
129
Forks
18
Language
—
License
CC-BY-SA-4.0
Category
Last pushed
Jul 17, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/secsilm/zi-dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NateScarlet/holiday-cn
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告
sagorbrur/bnlp
BNLP is a natural language processing toolkit for Bengali Language.
brightmart/nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
houbb/sensitive-word
👮♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java...
esbatmop/MNBVC
MNBVC(Massive Never-ending BT Vast Chinese...