kreuzberg-dev/kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 76+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

/ 100

Verified

This tool helps you quickly and accurately extract information from a wide range of documents and code files. It takes in various file types like PDFs, Office documents, images, and programming files, and outputs structured text, metadata, and even detailed code elements like functions and classes. This is ideal for developers who need to process large volumes of diverse documents or code for tasks like building search engines, RAG pipelines, or document analysis systems.

6,689 stars. Used by 6 other packages. Actively maintained with 731 commits in the last 30 days. Available on PyPI.

Use this if you need to reliably extract content and structure from nearly any document or programming file for automated processing or analysis.

Not ideal if you only need basic text extraction from a single, consistent document type and don't require advanced metadata or code intelligence.

document-processing information-extraction code-analysis data-pipelines content-management

Maintenance 22 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 17 / 25

How are scores calculated?

Stars

6,689

Forks

316

Language

Rust

License

MIT

Compare

kreuzberg and pdf_oxide

Related tools

PaddlePaddle/PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR...

yfedoseev/pdf_oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown...

opendataloader-project/opendataloader-pdf

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

AKSarav/pdfstract

PDFStract - The Extraction and Chunking Layer in Your RAG Pipeline - Available as CLI - WEBUI - API

NanoNets/docext

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking...

Explore RAG Tools

All categories Trending RAG directory Insights