microsoft/presidio-research

This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.

59
/ 100
Established

This tool helps data scientists and machine learning engineers create realistic, synthetic datasets containing personally identifiable information (PII). You provide sentence templates and it generates new sentences with fake PII, which can then be used to train and evaluate PII detection models. The tool also assesses the accuracy of PII detection systems like Presidio.

267 stars.

Use this if you need to develop, test, or fine-tune PII detection models and require high-quality, diverse synthetic data.

Not ideal if you are looking for an out-of-the-box PII detection solution rather than a tool for model development and evaluation.

PII-detection synthetic-data-generation named-entity-recognition model-evaluation data-anonymization
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 23 / 25

How are scores calculated?

Stars

267

Forks

71

Language

Jupyter Notebook

License

MIT

Last pushed

Mar 02, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/microsoft/presidio-research"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.