JiangYanting/Pre-modern_Chinese_corpus_dataset

近代汉语语料库数据集自然语言处理语料库古代汉语古汉语文言文数字人文计算语言

/ 100

Emerging

This project offers a vast collection of pre-modern Chinese texts, organized by dynasty (Song, Yuan, Ming, Qing, Republic of China) and content type like essays, novels, poetry, and historical records. It provides rich textual data for researchers studying the Chinese language, literature, and history across different eras. Scholars and educators in Sinology, linguistics, or digital humanities can use this corpus for various analyses.

169 stars. No commits in the last 6 months.

Use this if you need a large, categorized dataset of historical Chinese texts for academic research, language analysis, or developing educational materials.

Not ideal if you are looking for modern Chinese texts or need highly structured, annotated data beyond basic categorization by dynasty and content type.

Chinese language studies historical linguistics Sinology digital humanities Chinese literature research

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 14 / 25

How are scores calculated?

Stars

169

Forks

Language

HTML

License

—

Higher-rated alternatives

nltk/nltk

NLTK Source

explosion/spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python

undertheseanlp/underthesea

Underthesea - Vietnamese NLP Toolkit

stanfordnlp/stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many...

flairNLP/flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Explore NLP Tools

All categories Trending NLP directory Insights