microsoft/presidio

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

77
/ 100
Verified

This tool helps organizations and data professionals automatically find and protect sensitive information, like names, credit card numbers, or social security numbers, within various types of data. You provide text, images, or structured data containing private details, and it returns the same content with those sensitive parts identified, removed, or hidden. It's used by anyone responsible for data privacy, compliance, or secure data handling to ensure personal information isn't exposed.

7,198 stars. Actively maintained with 29 commits in the last 30 days. Available on PyPI.

Use this if you need to reliably identify and de-identify personal data in large volumes of text, images, or databases to meet privacy regulations or secure data sharing needs.

Not ideal if you need a guaranteed 100% perfect identification of all sensitive data without any human oversight, as automated detection may not catch every single instance.

data-privacy compliance data-anonymization information-security GDPR-CCPA
Maintenance 20 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 22 / 25

How are scores calculated?

Stars

7,198

Forks

960

Language

Python

License

MIT

Last pushed

Mar 13, 2026

Commits (30d)

29

Dependencies

2

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/microsoft/presidio"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.