J0nasW/science-datalake

Unified data lake of 293M scientific papers from 8 scholarly sources + 13 ontologies (960 GB Parquet, queryable via DuckDB)

25
/ 100
Experimental

This project provides a comprehensive database of scientific papers, including full text, citations, and specialized metadata like retraction notices or funding links. It combines information from eight major scholarly sources and thirteen scientific ontologies, making it easier to analyze scientific trends or conduct literature reviews. Researchers, data scientists in academia, or those building AI applications for science can use this to quickly query across millions of publications.

Use this if you need a unified and queryable collection of scientific literature, complete with rich metadata and ontologies, for large-scale analysis or AI model training.

Not ideal if you only need to search for a few papers or if you prefer using web-based search engines for literature discovery.

scientific-literature research-analysis bibliometrics knowledge-graph biomedical-informatics
No Package No Dependents
Maintenance 10 / 25
Adoption 4 / 25
Maturity 11 / 25
Community 0 / 25

How are scores calculated?

Stars

8

Forks

Language

Jupyter Notebook

License

Last pushed

Mar 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/J0nasW/science-datalake"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.