nonamestreet/weixin_public_corpus

微信公众号语料库

43
/ 100
Emerging

This corpus provides a collection of articles from various WeChat Official Accounts, delivered as clean, plain text. Each entry is a JSON object containing the account's name and ID, the article title, and its full content. It's designed for researchers needing large volumes of real-world Chinese text data from a popular social media platform.

591 stars. No commits in the last 6 months.

Use this if you are a researcher needing a substantial dataset of WeChat Official Account articles for linguistic analysis, natural language processing, or social science studies.

Not ideal if you require real-time data, wish to interact directly with the WeChat platform, or need data for commercial applications.

social-media-research chinese-nlp text-corpus linguistic-studies wechat-content-analysis
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 25 / 25

How are scores calculated?

Stars

591

Forks

163

Language

License

Last pushed

Jan 07, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/nonamestreet/weixin_public_corpus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.