tsawler/tabula
Pure Go text extraction library with fluent API, layout analysis, and RAG-ready chunking with support for pdf, html, odt, epub, and MS Office documents
This tool helps developers working with Go applications to accurately extract text from various document types, including PDFs, Word, Excel, PowerPoint, HTML, and EPUB files. It takes these documents as input and outputs clean, structured text or Markdown, even from scanned documents via OCR. It is designed for developers building systems that need to process and understand content from diverse document sources, especially for tasks like information retrieval or generating insights.
Use this if you are a developer building a Go application and need to programmatically extract well-structured text, including layout details like headings, paragraphs, and tables, from a wide range of document formats.
Not ideal if you are an end-user looking for a desktop application or a simple online tool to extract text without writing code.
Stars
11
Forks
1
Language
Go
License
MIT
Category
Last pushed
Feb 04, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/tsawler/tabula"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
copilot-extensions/rag-extension
An example extension in go using retrevial-augmented generation
wangle201210/go-rag
基于eino+gf+vue实现知识库的rag
LlamaEdge/rag-api-server
A RAG API server written in Rust following OpenAI specs
timescale/pgai
A suite of tools to develop RAG, semantic search, and other AI applications more easily with PostgreSQL
ca-srg/ragent
RAGent - A CLI tool for building RAG systems with hybrid search (BM25 + vector) using Amazon S3...