JiangYanting/Pre-modern_Chinese_corpus_dataset

近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言

32
/ 100
Emerging

This project offers a vast collection of pre-modern Chinese texts, organized by dynasty (Song, Yuan, Ming, Qing, Republic of China) and content type like essays, novels, poetry, and historical records. It provides rich textual data for researchers studying the Chinese language, literature, and history across different eras. Scholars and educators in Sinology, linguistics, or digital humanities can use this corpus for various analyses.

169 stars. No commits in the last 6 months.

Use this if you need a large, categorized dataset of historical Chinese texts for academic research, language analysis, or developing educational materials.

Not ideal if you are looking for modern Chinese texts or need highly structured, annotated data beyond basic categorization by dynasty and content type.

Chinese language studies historical linguistics Sinology digital humanities Chinese literature research
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 14 / 25

How are scores calculated?

Stars

169

Forks

18

Language

HTML

License

Last pushed

Mar 04, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/JiangYanting/Pre-modern_Chinese_corpus_dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.