titipata/pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset

/ 100

Established

This tool helps researchers, scientists, and medical professionals automatically extract specific information from PubMed Open-Access XML and MEDLINE XML files. You feed it scientific article data in XML format, and it gives you structured information like titles, abstracts, authors, references, image captions, and even full paragraphs in a clean, easy-to-use format. This is ideal for anyone working with large collections of biomedical literature who needs to pull out specific details for analysis.

727 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to systematically extract structured data from vast collections of PubMed Open-Access or MEDLINE XML articles for research, text mining, or natural language processing.

Not ideal if you only need to look up a few articles manually or prefer to work directly with web interfaces instead of programmatic data extraction.

biomedical-research scientific-literature-analysis medical-data-extraction academic-text-mining bibliographic-analysis

Stale 6m

Maintenance 2 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 25 / 25

How are scores calculated?

Stars

727

Forks

178

Language

Python

License

MIT

Related tools

nfflow/pubmedflow

Data Collection API for pubmed

greenelab/snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊

purplepotion/sadrat

Smart Adverse Drug Reaction Assessment Tools.

KarelDO/BioDEX

BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World Pharmacovigilance.

databricks-industry-solutions/adverse-drug-events

To ensure ongoing drug safety, pharma companies need to monitor and report adverse drug events...

Explore NLP Tools

All categories Trending NLP directory Insights