Capevace/data-wizard
Extract structured data from PDFs, Word docs and images. Embeddable directly into your application, regardless of the stack.
This tool helps you automatically pull specific pieces of information from documents like PDFs, Word files, and images. You define exactly what data you need, and it processes your documents to deliver that information in a clean, structured format like JSON. It's ideal for anyone who regularly needs to extract key details from various document types for data entry, analysis, or integration into other systems.
Use this if you need to reliably convert messy, unstructured documents into organized, validated data that can be easily used in databases, spreadsheets, or other applications.
Not ideal if you only occasionally need to extract data from a handful of simple documents, as setting it up requires some technical comfort.
Stars
68
Forks
18
Language
JavaScript
License
AGPL-3.0
Category
Last pushed
Oct 29, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Capevace/data-wizard"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NanoNets/docstrange
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple...
th1nhhdk/local_ai_ocr
An local, offline (after initial setup), portable OCR software that can process images and PDF...
Dicklesworthstone/llm_aided_ocr
Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking,...
emcf/thepipe
Get clean data from tricky documents, powered by vision-language models ⚡
langstruct-ai/langstruct
Extract structured data from any content using LLMs.