Ahren09/SciEvo
A longitudinal dataset for academic literature, including papers, metadata, and citation graphs, Also available on 🤗 HuggingFace and Kaggle
This dataset helps researchers and academics analyze the evolution of scientific knowledge over 30 years. It provides a vast collection of over two million academic papers, including their titles, abstracts, publication dates, authors, and detailed citation networks. Researchers in fields like scientometrics and library science can use this to study long-term trends, citation practices, and knowledge exchange across disciplines.
No commits in the last 6 months.
Use this if you need a comprehensive, pre-processed dataset of academic literature from arXiv, complete with rich metadata and citation graphs, to study trends in research fields.
Not ideal if you only need data from a very specific, niche academic journal not covered by arXiv, or if you require real-time updates on newly published papers.
Stars
17
Forks
1
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Sep 06, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/Ahren09/SciEvo"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ewok-core/ewok-paper
Elements of World Knowledge! This repository houses data and code needed to replicate our first...
itrummer/thalamusdb
ThalamusDB: semantic query processing on multimodal data
texttron/hyde
HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels
ArslanKAS/Large-Language-Models-with-Semantic-Search
Explore from keyword search to dense retrieval and reranking, which injects the intelligence of...
jzhoubu/vsearch
An Extensible Framework for Retrieval-Augmented LLM Applications: Learning Relevance Beyond...