kreuzberg and pdf_oxide

These are competitors offering overlapping document extraction capabilities—both extract text and metadata from PDFs and other formats—though pdf_oxide specializes in performance-critical scenarios while kreuzberg emphasizes broad format coverage (76+ formats vs. primarily PDFs).

kreuzberg
79
Verified
pdf_oxide
67
Established
Maintenance 22/25
Adoption 15/25
Maturity 25/25
Community 17/25
Maintenance 10/25
Adoption 19/25
Maturity 22/25
Community 16/25
Stars: 6,689
Forks: 316
Downloads:
Commits (30d): 731
Language: Rust
License: MIT
Stars: 421
Forks: 40
Downloads: 6,692
Commits (30d): 0
Language: Rust
License: Apache-2.0
No risk flags
No Dependents

About kreuzberg

kreuzberg-dev/kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 76+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

This tool helps you quickly and accurately extract information from a wide range of documents and code files. It takes in various file types like PDFs, Office documents, images, and programming files, and outputs structured text, metadata, and even detailed code elements like functions and classes. This is ideal for developers who need to process large volumes of diverse documents or code for tasks like building search engines, RAG pipelines, or document analysis systems.

document-processing information-extraction code-analysis data-pipelines content-management

About pdf_oxide

yfedoseev/pdf_oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

This tool helps you quickly get information out of PDF documents, convert them to other formats, or even fill out forms. You can feed it individual PDF files or entire batches, and it will give you back the raw text, images, structured data like tables, or converted Markdown/HTML files. It's designed for anyone who needs to process many PDFs efficiently, such as data analysts, researchers, or operations managers.

document-processing data-extraction workflow-automation data-conversion information-retrieval

Scores updated daily from GitHub, PyPI, and npm data. How scores work