SMI/IsIdentifiable
A tool for detecting identifiable information in data sources (CSV, DICOM, Relational Database and MongoDB)
This tool helps you find sensitive or personal information within your data files and databases. It takes in various data sources like CSV files, DICOM images, and relational databases, then highlights specific text or entries that might reveal someone's identity. This is useful for data privacy officers, compliance managers, and researchers who need to ensure data anonymization and protect personal information.
Use this if you need to quickly scan large datasets across different formats to identify and manage personally identifiable information (PII) or protected health information (PHI).
Not ideal if you are looking for a comprehensive, automated data anonymization solution rather than just detection and flagging of identifiable data.
Stars
14
Forks
3
Language
C#
License
GPL-3.0
Category
Last pushed
Nov 24, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/SMI/IsIdentifiable"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
DataFog/datafog-python
Python SDK for PII detection and redaction in text and images, combining regex + NLP pipelines...
vmenger/deduce
Deduce: de-identification method for Dutch medical text
aphp/eds-pseudo
EDS-Pseudo is a hybrid model for detecting personally identifying entities in clinical reports
seanpedrick-case/doc_redaction
Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical user interface....
martincjespersen/DaAnonymization
Simple customizable pipeline tool for anonymizing Danish text.