JiangYanting/Pre-modern_Chinese_corpus_dataset
近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言
This project offers a vast collection of pre-modern Chinese texts, organized by dynasty (Song, Yuan, Ming, Qing, Republic of China) and content type like essays, novels, poetry, and historical records. It provides rich textual data for researchers studying the Chinese language, literature, and history across different eras. Scholars and educators in Sinology, linguistics, or digital humanities can use this corpus for various analyses.
169 stars. No commits in the last 6 months.
Use this if you need a large, categorized dataset of historical Chinese texts for academic research, language analysis, or developing educational materials.
Not ideal if you are looking for modern Chinese texts or need highly structured, annotated data beyond basic categorization by dynasty and content type.
Stars
169
Forks
18
Language
HTML
License
—
Category
Last pushed
Mar 04, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/JiangYanting/Pre-modern_Chinese_corpus_dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
nltk/nltk
NLTK Source
explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
undertheseanlp/underthesea
Underthesea - Vietnamese NLP Toolkit
stanfordnlp/stanza
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many...
flairNLP/flair
A very simple framework for state-of-the-art Natural Language Processing (NLP)