nonamestreet/weixin_public_corpus

微信公众号语料库

/ 100

Emerging

This corpus provides a collection of articles from various WeChat Official Accounts, delivered as clean, plain text. Each entry is a JSON object containing the account's name and ID, the article title, and its full content. It's designed for researchers needing large volumes of real-world Chinese text data from a popular social media platform.

591 stars. No commits in the last 6 months.

Use this if you are a researcher needing a substantial dataset of WeChat Official Account articles for linguistic analysis, natural language processing, or social science studies.

Not ideal if you require real-time data, wish to interact directly with the WeChat platform, or need data for commercial applications.

social-media-research chinese-nlp text-corpus linguistic-studies wechat-content-analysis

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 25 / 25

How are scores calculated?

Stars

591

Forks

163

Language

—

License

—

Higher-rated alternatives

NateScarlet/holiday-cn

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

sagorbrur/bnlp

BNLP is a natural language processing toolkit for Bengali Language.

brightmart/nlp_chinese_corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

esbatmop/MNBVC

MNBVC(Massive Never-ending BT Vast Chinese...

houbb/sensitive-word

👮‍♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java...

Explore NLP Tools

All categories Trending NLP directory Insights