easonlai/chat_with_pdf_table

The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.

/ 100

Experimental

This helps data scientists and AI engineers analyze complex PDF documents by efficiently extracting tabular data. It takes your PDF files as input and outputs structured, context-rich table data that is ready for embedding into language models. This allows you to build more accurate Q&A systems and information extraction tools from your documents.

No commits in the last 6 months.

Use this if you need to extract structured data from tables within PDF documents and use it effectively with large language models for tasks like question answering or summarization.

Not ideal if you only need to view PDF tables or if your primary goal is simple, non-AI-driven data extraction.

document-analysis data-extraction NLP-development AI-engineering information-retrieval

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

gustavz/DataChad

Ask questions about any data source by leveraging langchains

leanderme/sytora

A sophisticated smart symptom search engine

e-m3din4/booby-trap-pdf

Embed malware, apks, executables or any other binary file into a PDF, or generate a PDF with...

knowledge-ukraine/medlocalgpt

⚕️ Applying LLM-powered AI Agents to Support for Physical Rehabilitation & Telerehabilitation...

easonlai/chat_with_pdf_streamlit_llama2

In this repository, you will discover how Streamlit, a Python framework for developing...

Explore Embedding Tools

All categories Trending Embeddings directory Insights