microsoft/presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
This tool helps data scientists and machine learning engineers create realistic, synthetic datasets containing personally identifiable information (PII). You provide sentence templates and it generates new sentences with fake PII, which can then be used to train and evaluate PII detection models. The tool also assesses the accuracy of PII detection systems like Presidio.
267 stars.
Use this if you need to develop, test, or fine-tune PII detection models and require high-quality, diverse synthetic data.
Not ideal if you are looking for an out-of-the-box PII detection solution rather than a tool for model development and evaluation.
Stars
267
Forks
71
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Mar 02, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/microsoft/presidio-research"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
rhnfzl/SqueakyCleanText
Text preprocessing and PII anonymisation for NLP/ML. ONNX NER ensemble, language detection,...
rushilpatel21/Redactify
Redactify is an efficient data redaction tool that secures sensitive text using advanced NLP and...
zulqarnainalipk/PII-Data-Detection
🔐 NLP-powered pipeline for detecting and removing Personally Identifiable Information (PII) from...
4n33sh/REDACT
REDACT is an info-sec tool that automates redaction with minimal user interaction. It utilizes...