ICIJ/datashare

A self‑hosted search engine for documents

67
/ 100
Established

Datashare helps investigative journalists, researchers, and legal professionals sift through vast amounts of internal documents to uncover hidden connections and narratives. You feed it a diverse collection of files like PDFs, emails, spreadsheets, and images, and it outputs a fully searchable database with extracted text, recognized entities (people, organizations), and categorized documents. It's designed for anyone who needs to securely analyze large private datasets and find crucial information.

713 stars. Actively maintained with 48 commits in the last 30 days.

Use this if you need a secure, private, self-hosted system to ingest, search, and analyze a large collection of sensitive documents and find key information or patterns.

Not ideal if you're looking for a simple desktop search tool for your personal computer or a cloud-based solution that integrates with public web content.

investigative journalism legal discovery document analysis research intelligence corporate forensics
No Package No Dependents
Maintenance 23 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

713

Forks

66

Language

Java

License

AGPL-3.0

Last pushed

Mar 16, 2026

Commits (30d)

48

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/ICIJ/datashare"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.