Huang-lab/figure-extractor

Flask-based service using PDFFigures 2.0 to extract figures and tables from scholarly PDFs. Features REST API, CLI, Docker support, and JSON metadata output (~1.5s/page processing). Designed for document processing and RAG pipelines.

30
/ 100
Emerging

This tool helps researchers, data scientists, or content managers automatically pull out figures, tables, and their captions from scholarly PDF documents. You feed it research papers in PDF format, and it outputs the extracted images and structured metadata (like captions and coordinates) for each figure and table in JSON format. It's designed for anyone working with large collections of academic papers who needs to analyze or reuse their visual content.

Use this if you need to programmatically extract visual content like graphs, charts, and data tables from scientific or academic PDFs for further analysis or integration into other systems.

Not ideal if you only need to view PDFs or manually extract a few figures, as this tool is designed for automated, high-volume processing.

academic-research scientific-publishing document-analysis information-extraction research-data-management
No License No Package No Dependents
Maintenance 6 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 10 / 25

How are scores calculated?

Stars

15

Forks

2

Language

Python

License

Last pushed

Dec 29, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/Huang-lab/figure-extractor"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.