Xyntopia/pydoxtools
Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.
This tool helps you automatically extract specific information from a wide variety of documents, like PDFs, HTML, or images. You feed it your unstructured documents, and it gives you structured data such as tables, lists of keywords, identified entities (like addresses), or even answers to specific questions using AI. It's designed for anyone who needs to quickly pull out key details from many documents without manual effort, like data analysts, researchers, or business intelligence professionals.
No commits in the last 6 months. Available on PyPI.
Use this if you need to automate the extraction of specific data points, tables, or answers from a large collection of diverse document types.
Not ideal if you only need to process a few simple text files and prefer a solution without AI integration or complex pipeline capabilities.
Stars
87
Forks
14
Language
Python
License
MIT
Category
Last pushed
Sep 05, 2024
Commits (30d)
0
Dependencies
13
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Xyntopia/pydoxtools"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
NanoNets/docstrange
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple...
th1nhhdk/local_ai_ocr
An local, offline (after initial setup), portable OCR software that can process images and PDF...
Dicklesworthstone/llm_aided_ocr
Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking,...
emcf/thepipe
Get clean data from tricky documents, powered by vision-language models ⚡
langstruct-ai/langstruct
Extract structured data from any content using LLMs.