ydli-ai/CSL
[COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集
This dataset offers metadata for nearly 400,000 Chinese scientific journal papers published between 2010 and 2020. It includes titles, abstracts, keywords, and categorized labels (13 major categories and 67 specific disciplines) for each paper. Researchers and academics working with Chinese scientific literature can use this resource to analyze trends, build information retrieval systems, or train machine learning models for tasks like text summarization or classification.
662 stars. No commits in the last 6 months.
Use this if you need a comprehensive collection of structured information from Chinese scientific papers to support research, content organization, or AI model development related to scientific literature.
Not ideal if you primarily work with English scientific literature or require full-text content rather than just metadata.
Stars
662
Forks
60
Language
Python
License
—
Category
Last pushed
Jun 19, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ydli-ai/CSL"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
nltk/nltk
NLTK Source
explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
undertheseanlp/underthesea
Underthesea - Vietnamese NLP Toolkit
stanfordnlp/stanza
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many...
flairNLP/flair
A very simple framework for state-of-the-art Natural Language Processing (NLP)