NoEdgeAI/pdfdeal
A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处理(提升PDF在RAG中的召回率)。
This tool helps knowledge base builders and RAG system developers to process PDF documents for better information retrieval. It takes PDF or image files as input and converts them into structured formats like Markdown, LaTeX, or text, while preserving formulas and formatting. The output can then be used to enhance the accuracy of AI-powered knowledge bases.
284 stars.
Use this if you need to extract accurate text, formulas, and formatting from PDFs for use in AI-driven knowledge management or question-answering systems.
Not ideal if you only need simple text extraction without advanced formatting preservation or specific integration with RAG systems.
Stars
284
Forks
19
Language
Python
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/NoEdgeAI/pdfdeal"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
thiswillbeyourgithub/wdoc
Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype,...
Arterning/DeepParseX
DeepParseX 是一个强大的多模态文档解析与知识管理平台,支持 PDF、Word、Excel、PPT、图片、视频、音频 等多种文件格式的智能解析,自动提取关键信息,并构建...
laxmimerit/RAGWire
Production-grade RAG toolkit — ingest PDFs, DOCX, XLSX into Qdrant with LLM metadata extraction,...
David-Lolly/ViewRAG
图文并茂的 PDF RAG 系统:支持版式感知分块、图表深度理解与精准视觉溯源。 Multimodal PDF RAG: Features layout-aware chunking,...
atpuxiner/docsloader
This is a documents loader. (文档解析加载器,rag文档解析,rag知识库构建)