kreuzberg-dev/kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 76+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

79
/ 100
Verified

This tool helps you quickly and accurately extract information from a wide range of documents and code files. It takes in various file types like PDFs, Office documents, images, and programming files, and outputs structured text, metadata, and even detailed code elements like functions and classes. This is ideal for developers who need to process large volumes of diverse documents or code for tasks like building search engines, RAG pipelines, or document analysis systems.

6,689 stars. Used by 6 other packages. Actively maintained with 731 commits in the last 30 days. Available on PyPI.

Use this if you need to reliably extract content and structure from nearly any document or programming file for automated processing or analysis.

Not ideal if you only need basic text extraction from a single, consistent document type and don't require advanced metadata or code intelligence.

document-processing information-extraction code-analysis data-pipelines content-management
Maintenance 22 / 25
Adoption 15 / 25
Maturity 25 / 25
Community 17 / 25

How are scores calculated?

Stars

6,689

Forks

316

Language

Rust

License

MIT

Last pushed

Mar 12, 2026

Commits (30d)

731

Reverse dependents

6

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/kreuzberg-dev/kreuzberg"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.