tsawler/tabula

Pure Go text extraction library with fluent API, layout analysis, and RAG-ready chunking with support for pdf, html, odt, epub, and MS Office documents

35
/ 100
Emerging

This tool helps developers working with Go applications to accurately extract text from various document types, including PDFs, Word, Excel, PowerPoint, HTML, and EPUB files. It takes these documents as input and outputs clean, structured text or Markdown, even from scanned documents via OCR. It is designed for developers building systems that need to process and understand content from diverse document sources, especially for tasks like information retrieval or generating insights.

Use this if you are a developer building a Go application and need to programmatically extract well-structured text, including layout details like headings, paragraphs, and tables, from a wide range of document formats.

Not ideal if you are an end-user looking for a desktop application or a simple online tool to extract text without writing code.

document processing data extraction content analysis information retrieval AI development
No Package No Dependents
Maintenance 10 / 25
Adoption 5 / 25
Maturity 13 / 25
Community 7 / 25

How are scores calculated?

Stars

11

Forks

1

Language

Go

License

MIT

Last pushed

Feb 04, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/tsawler/tabula"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.