easonlai/chat_with_pdf_table
The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.
This helps data scientists and AI engineers analyze complex PDF documents by efficiently extracting tabular data. It takes your PDF files as input and outputs structured, context-rich table data that is ready for embedding into language models. This allows you to build more accurate Q&A systems and information extraction tools from your documents.
No commits in the last 6 months.
Use this if you need to extract structured data from tables within PDF documents and use it effectively with large language models for tasks like question answering or summarization.
Not ideal if you only need to view PDF tables or if your primary goal is simple, non-AI-driven data extraction.
Stars
9
Forks
4
Language
Jupyter Notebook
License
—
Category
Last pushed
Oct 23, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/easonlai/chat_with_pdf_table"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
gustavz/DataChad
Ask questions about any data source by leveraging langchains
leanderme/sytora
A sophisticated smart symptom search engine
e-m3din4/booby-trap-pdf
Embed malware, apks, executables or any other binary file into a PDF, or generate a PDF with...
knowledge-ukraine/medlocalgpt
⚕️ Applying LLM-powered AI Agents to Support for Physical Rehabilitation & Telerehabilitation...
easonlai/chat_with_pdf_streamlit_llama2
In this repository, you will discover how Streamlit, a Python framework for developing...