edamontology/pubfetcher
A Java command-line tool and library to download and store publications with metadata by combining content from various online resources
This tool helps researchers, curators, or information scientists gather comprehensive details about academic publications, especially in biomedical and life sciences. It takes publication identifiers (like PMIDs or DOIs) and fetches titles, abstracts, full texts, keywords, and other metadata from various online sources. The result is a well-organized collection of publication data, stored locally or exported as JSON, ready for further analysis.
Use this if you need to gather detailed and complete information for thousands of biomedical or life sciences publications, pulling content from multiple sources like Europe PMC, PubMed, and even publisher websites.
Not ideal if you need to process millions of publications or if author lists are a critical piece of metadata for your analysis, as this is not currently supported.
Stars
8
Forks
1
Language
Java
License
GPL-3.0
Category
Last pushed
Jan 22, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/edamontology/pubfetcher"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.