thunlp/THUOCL

THUOCL(THU Open Chinese Lexicon)中文词库

51
/ 100
Established

This project provides a collection of high-quality Chinese vocabulary lists covering various domains like IT, finance, medicine, and law. Each list contains common words and their document frequency (DF) values, which indicate how often a word appears in a large collection of texts. These curated lists are designed to improve the accuracy of Chinese text segmentation for natural language processing practitioners.

1,034 stars. No commits in the last 6 months.

Use this if you are working with Chinese text and need specialized vocabulary lists to achieve more precise word segmentation in your applications or research.

Not ideal if you need a dictionary for general lookup or translation, as these lists are specifically formatted and curated for computational linguistics tasks.

Chinese-language-processing text-analysis lexicon-development information-extraction computational-linguistics
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

1,034

Forks

206

Language

License

MIT

Last pushed

Apr 03, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/thunlp/THUOCL"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.