nonamestreet/weixin_public_corpus
微信公众号语料库
This corpus provides a collection of articles from various WeChat Official Accounts, delivered as clean, plain text. Each entry is a JSON object containing the account's name and ID, the article title, and its full content. It's designed for researchers needing large volumes of real-world Chinese text data from a popular social media platform.
591 stars. No commits in the last 6 months.
Use this if you are a researcher needing a substantial dataset of WeChat Official Account articles for linguistic analysis, natural language processing, or social science studies.
Not ideal if you require real-time data, wish to interact directly with the WeChat platform, or need data for commercial applications.
Stars
591
Forks
163
Language
—
License
—
Category
Last pushed
Jan 07, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/nonamestreet/weixin_public_corpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NateScarlet/holiday-cn
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告
sagorbrur/bnlp
BNLP is a natural language processing toolkit for Bengali Language.
brightmart/nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
esbatmop/MNBVC
MNBVC(Massive Never-ending BT Vast Chinese...
houbb/sensitive-word
👮♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java...