microsoft/presidio
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
This tool helps organizations and data professionals automatically find and protect sensitive information, like names, credit card numbers, or social security numbers, within various types of data. You provide text, images, or structured data containing private details, and it returns the same content with those sensitive parts identified, removed, or hidden. It's used by anyone responsible for data privacy, compliance, or secure data handling to ensure personal information isn't exposed.
7,198 stars. Actively maintained with 29 commits in the last 30 days. Available on PyPI.
Use this if you need to reliably identify and de-identify personal data in large volumes of text, images, or databases to meet privacy regulations or secure data sharing needs.
Not ideal if you need a guaranteed 100% perfect identification of all sensitive data without any human oversight, as automated detection may not catch every single instance.
Stars
7,198
Forks
960
Language
Python
License
MIT
Category
Last pushed
Mar 13, 2026
Commits (30d)
29
Dependencies
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/microsoft/presidio"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Compare
Related models
KasraAhmadi/PII-360
An open-source Chrome Extension that identifies Personally Identifiable Information (PII) in...
AnonShield/AnonLFI2.0
Extensible PII pseudonymization framework for CSIRTs. Features OCR, technical entity...
romelancheta/AutoRedact
🛡️ Redact sensitive information from images securely in your browser with AutoRedact, featuring...
JuanDiego-10/Privacy_Protection_Redaction_LLM
Privacy_Protection_Redaction_LLM is a machine learning model designed to identify and redact...
sotthang/kpii
한국어 개인정보 식별 및 마스킹을 위한 Python 패키지입니다. 이름, 전화번호, 이메일, 주민번호 등 다양한 개인정보를 자동으로 탐지하고 마스킹 처리할 수 있습니다.