edwardcooper/data-sentry

A project to build a machine learning pipeline to detect personal identifiable information (PII)

/ 100

Emerging

This tool helps you automatically identify and flag sensitive personal information within large volumes of text. You input documents, emails, or other text, and it highlights or redacts details like names, addresses, or social security numbers. It's designed for anyone managing or processing text data that might contain private user information.

No commits in the last 6 months.

Use this if you need to ensure compliance with data privacy regulations by automatically detecting Personally Identifiable Information (PII) in text.

Not ideal if you need to detect sensitive data types other than PII, such as financial figures, medical codes, or intellectual property.

data-privacy compliance data-management information-security PII-detection

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

GPL-3.0

Higher-rated alternatives

DataFog/datafog-python

Python SDK for PII detection and redaction in text and images, combining regex + NLP pipelines...

vmenger/deduce

Deduce: de-identification method for Dutch medical text

aphp/eds-pseudo

EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports

seanpedrick-case/doc_redaction

Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical user interface....

martincjespersen/DaAnonymization

Simple customizable pipeline tool for anonymizing Danish text.

Explore NLP Tools

All categories Trending NLP directory Insights