hantang/data-corpus
语料数据和词库收集:中文、英文停用词,情感分析,分类词典,敏感词库(违禁词,审查词)。stop words, sentiment analysis, thesaurus, censorship/sensitive word
This resource helps you easily access and manage various word lists critical for text processing in Chinese and English. It provides ready-to-use lists like stop words, sentiment vocabularies, thematic thesauri, and sensitive/censorship terms. Anyone working with text data, such as a content analyst, social media manager, or researcher, would find this useful for cleaning, categorizing, or monitoring textual information.
Use this if you need pre-compiled word lists to efficiently prepare, analyze, or filter text content across different languages and applications.
Not ideal if you require highly specialized, domain-specific vocabularies that are not commonly available, or if you need to generate word embeddings or complex language models.
Stars
35
Forks
6
Language
—
License
—
Category
Last pushed
Feb 09, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/hantang/data-corpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NateScarlet/holiday-cn
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告
sagorbrur/bnlp
BNLP is a natural language processing toolkit for Bengali Language.
brightmart/nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
esbatmop/MNBVC
MNBVC(Massive Never-ending BT Vast Chinese...
houbb/sensitive-word
👮♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java...