PII Detection Redaction NLP Tools
Tools for detecting, masking, and redacting personally identifiable information (PII) in text, images, and documents. Does NOT include privacy policy analysis, general data anonymization frameworks, or data leak detection platforms.
There are 36 pii detection redaction tools tracked. 2 score above 50 (established tier). The highest-rated is DataFog/datafog-python at 63/100 with 48 stars.
Get all 36 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=pii-detection-redaction&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
DataFog/datafog-python
Python SDK for PII detection and redaction in text and images, combining... |
|
Established |
| 2 |
vmenger/deduce
Deduce: de-identification method for Dutch medical text |
|
Established |
| 3 |
aphp/eds-pseudo
EDS-Pseudo is a hybrid model for detecting personally identifying entities... |
|
Emerging |
| 4 |
seanpedrick-case/doc_redaction
Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical... |
|
Emerging |
| 5 |
martincjespersen/DaAnonymization
Simple customizable pipeline tool for anonymizing Danish text. |
|
Emerging |
| 6 |
thoughtbot/top_secret
Filter sensitive information from free text before sending it to external... |
|
Emerging |
| 7 |
SMI/IsIdentifiable
A tool for detecting identifiable information in data sources (CSV, DICOM,... |
|
Emerging |
| 8 |
icescentral/MASK_public
Masking identifiable information from health related documents. |
|
Emerging |
| 9 |
DilawarShafiq/phi-redactor
HIPAA-native PHI redaction proxy for AI/LLM interactions. Detects and masks... |
|
Emerging |
| 10 |
edwardcooper/data-sentry
A project to build a machine learning pipeline to detect personal... |
|
Emerging |
| 11 |
ahmedbesbes/anonymization-api
How to build and deploy an anonymization API with FastAPI and SpaCy |
|
Emerging |
| 12 |
databricks-industry-solutions/ocr-phi-masking
Our joint Solution Accelerator with John Snow Labs automates the detection... |
|
Emerging |
| 13 |
jftuga/deidentification
Deidentify people's names and gender specific pronouns |
|
Emerging |
| 14 |
Welding-Torch/Excel-Anonymizer
A Python script that anonymizes an Excel file and synthesizes new data in its place. |
|
Emerging |
| 15 |
vmenger/docdeid
Create your own document de-identifier using docdeid, a simple framework... |
|
Emerging |
| 16 |
HC200ok/manual-data-masking
A lightweight javascript library for manual data masking |
|
Emerging |
| 17 |
dimanjet/piicloak
Enterprise-grade PII detection and anonymization REST API built on Presidio |
|
Emerging |
| 18 |
awsaf49/pii-data-detection
The Learning Agency Lab - PII Data Detection || Develop automated techniques... |
|
Emerging |
| 19 |
zacharykzhao/CA4P-483
NLP dataset: Chinese Android Privacy Policy Dataset |
|
Emerging |
| 20 |
ahmedbesbes/anonymizer
Text Anonymization app with Streamlit and Spacy |
|
Emerging |
| 21 |
worka-ai/pii
A library to identify and help redact Personally Identifiable Information... |
|
Emerging |
| 22 |
hsleonis/pii-detection-group
Research Group works on PII Detection |
|
Experimental |
| 23 |
AbhilashaRavichander/PrivacyQA_EMNLP
PrivacyQA, a resource to support question-answering over privacy policies. |
|
Experimental |
| 24 |
OmkarPathak/piiscrub
A blazing-fast, zero-dependency PII scrubbing engine for LLMs. Multi-core... |
|
Experimental |
| 25 |
iYassr/maskr
Detect & mask PII in documents - 100% offline. Names, emails, phones, SSNs,... |
|
Experimental |
| 26 |
PriyeshDave/Document-Redaction
This project revolves around the ability to recognise sensitive words within... |
|
Experimental |
| 27 |
nedap/mdpi2021-textgen
Source code for the paper "Generating Synthetic Training Data for Supervised... |
|
Experimental |
| 28 |
SwissFederalArchives/tcc-metadata-anonymization
An named-entity-recognition (NER) based anonymizer for archival documents metadata. |
|
Experimental |
| 29 |
biagiocornacchia/microsoft-presidio-using-grpc
Implementation of a Distributed Personal Information Recognition System that... |
|
Experimental |
| 30 |
sonu-gupta/Doxing-on-Twitter
This repository contains my work on the prevention and anonymization of dox... |
|
Experimental |
| 31 |
crisp-du/ppevo
Evolution of Privacy Policies |
|
Experimental |
| 32 |
Biswas-N/Redactor
Redactor is a python based utillity tool used to redact sensitive... |
|
Experimental |
| 33 |
Th3Tr00p3r/PrivacyPolicy
PPA breaks down privacy policies, aiming to simplify their understanding. By... |
|
Experimental |
| 34 |
sdsc-ordes/deid-module
Text deidentification module. |
|
Experimental |
| 35 |
ntsation/personal-data-pseudonymizer
The Personal Data Pseudonymizer is a Python script designed to anonymize... |
|
Experimental |
| 36 |
crs-org/multilingual-pii-detection
An API for the Personally Identifiable Information detection task |
|
Experimental |