PII Detection Redaction NLP Tools

Tools for detecting, masking, and redacting personally identifiable information (PII) in text, images, and documents. Does NOT include privacy policy analysis, general data anonymization frameworks, or data leak detection platforms.

There are 36 pii detection redaction tools tracked. 2 score above 50 (established tier). The highest-rated is DataFog/datafog-python at 63/100 with 48 stars.

Get all 36 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=pii-detection-redaction&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 DataFog/datafog-python

Python SDK for PII detection and redaction in text and images, combining...

63
Established
2 vmenger/deduce

Deduce: de-identification method for Dutch medical text

63
Established
3 aphp/eds-pseudo

EDS-Pseudo is a hybrid model for detecting personally identifying entities...

48
Emerging
4 seanpedrick-case/doc_redaction

Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical...

46
Emerging
5 martincjespersen/DaAnonymization

Simple customizable pipeline tool for anonymizing Danish text.

45
Emerging
6 thoughtbot/top_secret

Filter sensitive information from free text before sending it to external...

43
Emerging
7 SMI/IsIdentifiable

A tool for detecting identifiable information in data sources (CSV, DICOM,...

41
Emerging
8 icescentral/MASK_public

Masking identifiable information from health related documents.

39
Emerging
9 DilawarShafiq/phi-redactor

HIPAA-native PHI redaction proxy for AI/LLM interactions. Detects and masks...

39
Emerging
10 edwardcooper/data-sentry

A project to build a machine learning pipeline to detect personal...

39
Emerging
11 ahmedbesbes/anonymization-api

How to build and deploy an anonymization API with FastAPI and SpaCy

37
Emerging
12 databricks-industry-solutions/ocr-phi-masking

Our joint Solution Accelerator with John Snow Labs automates the detection...

36
Emerging
13 jftuga/deidentification

Deidentify people's names and gender specific pronouns

36
Emerging
14 Welding-Torch/Excel-Anonymizer

A Python script that anonymizes an Excel file and synthesizes new data in its place.

35
Emerging
15 vmenger/docdeid

Create your own document de-identifier using docdeid, a simple framework...

35
Emerging
16 HC200ok/manual-data-masking

A lightweight javascript library for manual data masking

35
Emerging
17 dimanjet/piicloak

Enterprise-grade PII detection and anonymization REST API built on Presidio

33
Emerging
18 awsaf49/pii-data-detection

The Learning Agency Lab - PII Data Detection || Develop automated techniques...

33
Emerging
19 zacharykzhao/CA4P-483

NLP dataset: Chinese Android Privacy Policy Dataset

33
Emerging
20 ahmedbesbes/anonymizer

Text Anonymization app with Streamlit and Spacy

32
Emerging
21 worka-ai/pii

A library to identify and help redact Personally Identifiable Information...

30
Emerging
22 hsleonis/pii-detection-group

Research Group works on PII Detection

24
Experimental
23 AbhilashaRavichander/PrivacyQA_EMNLP

PrivacyQA, a resource to support question-answering over privacy policies.

24
Experimental
24 OmkarPathak/piiscrub

A blazing-fast, zero-dependency PII scrubbing engine for LLMs. Multi-core...

24
Experimental
25 iYassr/maskr

Detect & mask PII in documents - 100% offline. Names, emails, phones, SSNs,...

23
Experimental
26 PriyeshDave/Document-Redaction

This project revolves around the ability to recognise sensitive words within...

23
Experimental
27 nedap/mdpi2021-textgen

Source code for the paper "Generating Synthetic Training Data for Supervised...

20
Experimental
28 SwissFederalArchives/tcc-metadata-anonymization

An named-entity-recognition (NER) based anonymizer for archival documents metadata.

20
Experimental
29 biagiocornacchia/microsoft-presidio-using-grpc

Implementation of a Distributed Personal Information Recognition System that...

20
Experimental
30 sonu-gupta/Doxing-on-Twitter

This repository contains my work on the prevention and anonymization of dox...

19
Experimental
31 crisp-du/ppevo

Evolution of Privacy Policies

18
Experimental
32 Biswas-N/Redactor

Redactor is a python based utillity tool used to redact sensitive...

17
Experimental
33 Th3Tr00p3r/PrivacyPolicy

PPA breaks down privacy policies, aiming to simplify their understanding. By...

17
Experimental
34 sdsc-ordes/deid-module

Text deidentification module.

15
Experimental
35 ntsation/personal-data-pseudonymizer

The Personal Data Pseudonymizer is a Python script designed to anonymize...

11
Experimental
36 crs-org/multilingual-pii-detection

An API for the Personally Identifiable Information detection task

10
Experimental