Digital-Dermatology/SelfClean

[NeurIPS 2024] πŸ§ΌπŸ”Ž A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors.

40
/ 100
Emerging

This tool helps machine learning engineers and researchers clean up image datasets by identifying common data quality issues. It takes your image dataset as input and outputs a list of detected off-topic samples, near duplicates, and potential label errors. This helps improve the reliability and performance of your machine learning models.

No commits in the last 6 months. Available on PyPI.

Use this if you need to quickly and holistically identify and address data quality problems like irrelevant images, highly similar images, or mislabeled images within your large image datasets.

Not ideal if you are looking for a manual image annotation tool or a solution for text-based data cleaning.

machine-learning-engineering computer-vision data-quality image-dataset-curation model-training
Stale 6m
Maintenance 2 / 25
Adoption 7 / 25
Maturity 25 / 25
Community 6 / 25

How are scores calculated?

Stars

36

Forks

2

Language

Python

License

Last pushed

Oct 14, 2025

Commits (30d)

0

Dependencies

24

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Digital-Dermatology/SelfClean"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.