NoEdgeAI/pdfdeal

A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处理(提升PDF在RAG中的召回率)。

48
/ 100
Emerging

This tool helps knowledge base builders and RAG system developers to process PDF documents for better information retrieval. It takes PDF or image files as input and converts them into structured formats like Markdown, LaTeX, or text, while preserving formulas and formatting. The output can then be used to enhance the accuracy of AI-powered knowledge bases.

284 stars.

Use this if you need to extract accurate text, formulas, and formatting from PDFs for use in AI-driven knowledge management or question-answering systems.

Not ideal if you only need simple text extraction without advanced formatting preservation or specific integration with RAG systems.

knowledge-management document-processing information-extraction AI-systems content-preparation
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

284

Forks

19

Language

Python

License

MIT

Last pushed

Mar 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/NoEdgeAI/pdfdeal"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.