DataFog/datafog-python

Python SDK for PII detection and redaction in text and images, combining regex + NLP pipelines for production privacy workflows.

/ 100

Established

This tool helps you automatically find and remove sensitive personal data, like email addresses, phone numbers, or credit card numbers, from text and images. It takes raw content, identifies personally identifiable information (PII), and replaces it with placeholders or redactions. Anyone who handles user-generated content, customer data, or internal documents that might contain private information would use this to ensure privacy and compliance.

Available on PyPI.

Use this if you need to automatically protect private information in text or images before it's stored, shared, or processed, especially when working with AI models.

Not ideal if you need a solution for non-textual data types or if you require human-in-the-loop review for every redaction decision.

data-privacy compliance content-moderation information-security large-language-models

Maintenance 13 / 25

Adoption 8 / 25

Maturity 25 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Related tools

vmenger/deduce

Deduce: de-identification method for Dutch medical text

aphp/eds-pseudo

EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports

seanpedrick-case/doc_redaction

Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical user interface....

martincjespersen/DaAnonymization

Simple customizable pipeline tool for anonymizing Danish text.

thoughtbot/top_secret

Filter sensitive information from free text before sending it to external services or APIs, such...

Explore NLP Tools

All categories Trending NLP directory Insights