tsawler/tabula

Pure Go text extraction library with fluent API, layout analysis, and RAG-ready chunking with support for pdf, html, odt, epub, and MS Office documents

/ 100

Emerging

This tool helps developers working with Go applications to accurately extract text from various document types, including PDFs, Word, Excel, PowerPoint, HTML, and EPUB files. It takes these documents as input and outputs clean, structured text or Markdown, even from scanned documents via OCR. It is designed for developers building systems that need to process and understand content from diverse document sources, especially for tasks like information retrieval or generating insights.

Use this if you are a developer building a Go application and need to programmatically extract well-structured text, including layout details like headings, paragraphs, and tables, from a wide range of document formats.

Not ideal if you are an end-user looking for a desktop application or a simple online tool to extract text without writing code.

document processing data extraction content analysis information retrieval AI development

No Package No Dependents

Maintenance 10 / 25

Adoption 5 / 25

Maturity 13 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

License

MIT

Higher-rated alternatives

copilot-extensions/rag-extension

An example extension in go using retrevial-augmented generation

wangle201210/go-rag

基于eino+gf+vue实现知识库的rag

LlamaEdge/rag-api-server

A RAG API server written in Rust following OpenAI specs

timescale/pgai

A suite of tools to develop RAG, semantic search, and other AI applications more easily with PostgreSQL

ca-srg/ragent

RAGent - A CLI tool for building RAG systems with hybrid search (BM25 + vector) using Amazon S3...

Explore RAG Tools

All categories Trending RAG directory Insights