rahulpunia29/extractous-go

Fast, multi-format document extraction library for Go. Includes streaming API for large files and OCR for scanned documents via Tesseract.

/ 100

Emerging

This is a Go library for developers that helps applications quickly extract text and metadata from a wide range of document types, including PDFs, Word, Excel, and scanned images. It takes various document files as input and outputs their textual content and associated metadata, even from very large or scanned documents. This is used by software engineers building applications that need to process and understand document content.

Use this if you are a Go developer building an application that needs fast, reliable, and memory-efficient extraction of text and metadata from a diverse set of document formats, including those requiring OCR.

Not ideal if you need a standalone application for document extraction rather than a library to integrate into your Go codebase, or if you are not a Go developer.

document-processing data-extraction text-mining information-retrieval content-management

No Package No Dependents

Maintenance 6 / 25

Adoption 8 / 25

Maturity 15 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

License

Apache-2.0

Higher-rated alternatives

ikawaha/kagome-dict

Dictionary Library for Kagome v2

aaaton/golem

A lemmatizer implemented in Go

habeanf/yap

Yet Another (natural language) Parser

clipperhouse/uax29

A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split graphemes, words, sentences.

abadojack/whatlanggo

Natural language detection library for Go

Explore NLP Tools

All categories Trending NLP directory Insights