parsee-ai/parsee-core
Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.
This project helps financial analysts, data entry specialists, and operations managers automatically extract specific information from unstructured documents like PDFs, HTML files, and images. You input these documents, define what data points you need (like an invoice total and its currency), and it outputs that data in a structured, usable format. It's especially useful for handling financial documents.
Use this if you regularly need to pull specific pieces of information, especially from tables within financial PDFs or HTML files, and want to automate this process to get structured data.
Not ideal if your primary goal is general text summarization or if your data sources are exclusively plain text without any complex structuring or tables.
Stars
83
Forks
2
Language
Python
License
MIT
Category
Last pushed
Jan 07, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/parsee-ai/parsee-core"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
google/langextract
A Python library for extracting structured information from unstructured text using LLMs with...
Extralit/extralit
Fast and accurate systemic data extraction with LLM assistance
Keyvanhardani/german-ocr
German-OCR is specifically trained to extract text from German documents including invoices,...
oidlabs-com/Lexoid
Multimodal document parser for high quality data understanding and extraction
xingbow/SciDaEx
Structured data extraction from research literature