titipata/pubmed_parser
:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
This tool helps researchers, scientists, and medical professionals automatically extract specific information from PubMed Open-Access XML and MEDLINE XML files. You feed it scientific article data in XML format, and it gives you structured information like titles, abstracts, authors, references, image captions, and even full paragraphs in a clean, easy-to-use format. This is ideal for anyone working with large collections of biomedical literature who needs to pull out specific details for analysis.
727 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to systematically extract structured data from vast collections of PubMed Open-Access or MEDLINE XML articles for research, text mining, or natural language processing.
Not ideal if you only need to look up a few articles manually or prefer to work directly with web interfaces instead of programmatic data extraction.
Stars
727
Forks
178
Language
Python
License
MIT
Category
Last pushed
Jul 31, 2025
Commits (30d)
0
Dependencies
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/titipata/pubmed_parser"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
nfflow/pubmedflow
Data Collection API for pubmed
greenelab/snorkeling
Extracting biomedical relationships from literature with Snorkel 🏊
purplepotion/sadrat
Smart Adverse Drug Reaction Assessment Tools.
KarelDO/BioDEX
BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World Pharmacovigilance.
databricks-industry-solutions/adverse-drug-events
To ensure ongoing drug safety, pharma companies need to monitor and report adverse drug events...