allenai/papermage

library supporting NLP and CV research on scientific papers

52
/ 100
Established

Papermage helps scientific researchers analyze the structure and content of research papers. You input a PDF of a scientific paper, and it outputs a structured digital document that breaks down the paper into components like pages, paragraphs, sentences, tables, and figures. This tool is for scientists, academics, or anyone needing to programmatically extract and work with specific elements from scientific PDFs.

791 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to programmatically understand the layout and extract specific textual or visual elements from scientific PDFs for tasks like building a QA system or a knowledge graph.

Not ideal if you're looking for an actively maintained, production-ready solution, as this project is a research prototype.

scientific-research pdf-analysis academic-publishing information-extraction research-data
Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 17 / 25

How are scores calculated?

Stars

791

Forks

64

Language

Python

License

Apache-2.0

Last pushed

Nov 08, 2024

Commits (30d)

0

Dependencies

10

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/allenai/papermage"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.