easonlai/chat_with_pdf_table

The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.

28
/ 100
Experimental

This helps data scientists and AI engineers analyze complex PDF documents by efficiently extracting tabular data. It takes your PDF files as input and outputs structured, context-rich table data that is ready for embedding into language models. This allows you to build more accurate Q&A systems and information extraction tools from your documents.

No commits in the last 6 months.

Use this if you need to extract structured data from tables within PDF documents and use it effectively with large language models for tasks like question answering or summarization.

Not ideal if you only need to view PDF tables or if your primary goal is simple, non-AI-driven data extraction.

document-analysis data-extraction NLP-development AI-engineering information-retrieval
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 15 / 25

How are scores calculated?

Stars

9

Forks

4

Language

Jupyter Notebook

License

Last pushed

Oct 23, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/easonlai/chat_with_pdf_table"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.