ydli-ai/CSL

[COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集

35
/ 100
Emerging

This dataset offers metadata for nearly 400,000 Chinese scientific journal papers published between 2010 and 2020. It includes titles, abstracts, keywords, and categorized labels (13 major categories and 67 specific disciplines) for each paper. Researchers and academics working with Chinese scientific literature can use this resource to analyze trends, build information retrieval systems, or train machine learning models for tasks like text summarization or classification.

662 stars. No commits in the last 6 months.

Use this if you need a comprehensive collection of structured information from Chinese scientific papers to support research, content organization, or AI model development related to scientific literature.

Not ideal if you primarily work with English scientific literature or require full-text content rather than just metadata.

scientific-research academic-publishing natural-language-processing information-retrieval knowledge-management
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 17 / 25

How are scores calculated?

Stars

662

Forks

60

Language

Python

License

Last pushed

Jun 19, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ydli-ai/CSL"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.