hantang/data-corpus

语料数据和词库收集:中文、英文停用词,情感分析,分类词典,敏感词库(违禁词,审查词)。stop words, sentiment analysis, thesaurus, censorship/sensitive word

40
/ 100
Emerging

This resource helps you easily access and manage various word lists critical for text processing in Chinese and English. It provides ready-to-use lists like stop words, sentiment vocabularies, thematic thesauri, and sensitive/censorship terms. Anyone working with text data, such as a content analyst, social media manager, or researcher, would find this useful for cleaning, categorizing, or monitoring textual information.

Use this if you need pre-compiled word lists to efficiently prepare, analyze, or filter text content across different languages and applications.

Not ideal if you require highly specialized, domain-specific vocabularies that are not commonly available, or if you need to generate word embeddings or complex language models.

content-analysis text-moderation sentiment-analysis data-preprocessing linguistics
No License No Package No Dependents
Maintenance 10 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 15 / 25

How are scores calculated?

Stars

35

Forks

6

Language

License

Last pushed

Feb 09, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/hantang/data-corpus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.