Huang-lab/figure-extractor

Flask-based service using PDFFigures 2.0 to extract figures and tables from scholarly PDFs. Features REST API, CLI, Docker support, and JSON metadata output (~1.5s/page processing). Designed for document processing and RAG pipelines.

/ 100

Emerging

This tool helps researchers, data scientists, or content managers automatically pull out figures, tables, and their captions from scholarly PDF documents. You feed it research papers in PDF format, and it outputs the extracted images and structured metadata (like captions and coordinates) for each figure and table in JSON format. It's designed for anyone working with large collections of academic papers who needs to analyze or reuse their visual content.

Use this if you need to programmatically extract visual content like graphs, charts, and data tables from scientific or academic PDFs for further analysis or integration into other systems.

Not ideal if you only need to view PDFs or manually extract a few figures, as this tool is designed for automated, high-volume processing.

academic-research scientific-publishing document-analysis information-extraction research-data-management

No License No Package No Dependents

Maintenance 6 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

kreuzberg-dev/kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, and...

PaddlePaddle/PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR...

yfedoseev/pdf_oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown...

opendataloader-project/opendataloader-pdf

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

AKSarav/pdfstract

PDFStract - The Extraction and Chunking Layer in Your RAG Pipeline - Available as CLI - WEBUI - API

Explore RAG Tools

All categories Trending RAG directory Insights