rahulpunia29/extractous-go

Fast, multi-format document extraction library for Go. Includes streaming API for large files and OCR for scanned documents via Tesseract.

34
/ 100
Emerging

This is a Go library for developers that helps applications quickly extract text and metadata from a wide range of document types, including PDFs, Word, Excel, and scanned images. It takes various document files as input and outputs their textual content and associated metadata, even from very large or scanned documents. This is used by software engineers building applications that need to process and understand document content.

Use this if you are a Go developer building an application that needs fast, reliable, and memory-efficient extraction of text and metadata from a diverse set of document formats, including those requiring OCR.

Not ideal if you need a standalone application for document extraction rather than a library to integrate into your Go codebase, or if you are not a Go developer.

document-processing data-extraction text-mining information-retrieval content-management
No Package No Dependents
Maintenance 6 / 25
Adoption 8 / 25
Maturity 15 / 25
Community 5 / 25

How are scores calculated?

Stars

55

Forks

2

Language

Go

License

Apache-2.0

Category

go-nlp-libraries

Last pushed

Oct 25, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/rahulpunia29/extractous-go"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.