thiswillbeyourgithub/wdoc
Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, etc
This tool helps researchers, students, and professionals efficiently understand and get answers from many diverse documents. You provide a collection of files like PDFs, audio recordings, or web pages, and it produces concise summaries or direct, sourced answers to your questions. It's designed for anyone who needs to quickly extract precise information from a large, varied library of content.
510 stars. Available on PyPI.
Use this if you need to summarize or ask specific questions across thousands of documents in various formats and want reliable, sourced answers without manually sifting through each file.
Not ideal if you only work with a few simple text files and don't require advanced summarization or detailed, sourced query responses from a large, heterogeneous collection.
Stars
510
Forks
37
Language
Python
License
AGPL-3.0
Category
Last pushed
Mar 08, 2026
Commits (30d)
0
Dependencies
49
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/thiswillbeyourgithub/wdoc"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
Arterning/DeepParseX
DeepParseX 是一个强大的多模态文档解析与知识管理平台,支持 PDF、Word、Excel、PPT、图片、视频、音频 等多种文件格式的智能解析,自动提取关键信息,并构建...
NoEdgeAI/pdfdeal
A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall...
laxmimerit/RAGWire
Production-grade RAG toolkit — ingest PDFs, DOCX, XLSX into Qdrant with LLM metadata extraction,...
David-Lolly/ViewRAG
图文并茂的 PDF RAG 系统:支持版式感知分块、图表深度理解与精准视觉溯源。 Multimodal PDF RAG: Features layout-aware chunking,...
atpuxiner/docsloader
This is a documents loader. (文档解析加载器,rag文档解析,rag知识库构建)