DataFog/datafog-python
Python SDK for PII detection and redaction in text and images, combining regex + NLP pipelines for production privacy workflows.
This tool helps you automatically find and remove sensitive personal data, like email addresses, phone numbers, or credit card numbers, from text and images. It takes raw content, identifies personally identifiable information (PII), and replaces it with placeholders or redactions. Anyone who handles user-generated content, customer data, or internal documents that might contain private information would use this to ensure privacy and compliance.
Available on PyPI.
Use this if you need to automatically protect private information in text or images before it's stored, shared, or processed, especially when working with AI models.
Not ideal if you need a solution for non-textual data types or if you require human-in-the-loop review for every redaction decision.
Stars
48
Forks
11
Language
Python
License
MIT
Category
Last pushed
Mar 16, 2026
Commits (30d)
0
Dependencies
3
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/DataFog/datafog-python"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
vmenger/deduce
Deduce: de-identification method for Dutch medical text
aphp/eds-pseudo
EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports
seanpedrick-case/doc_redaction
Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical user interface....
martincjespersen/DaAnonymization
Simple customizable pipeline tool for anonymizing Danish text.
thoughtbot/top_secret
Filter sensitive information from free text before sending it to external services or APIs, such...