Digital-Dermatology/SelfClean
[NeurIPS 2024] π§Όπ A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors.
This tool helps machine learning engineers and researchers clean up image datasets by identifying common data quality issues. It takes your image dataset as input and outputs a list of detected off-topic samples, near duplicates, and potential label errors. This helps improve the reliability and performance of your machine learning models.
No commits in the last 6 months. Available on PyPI.
Use this if you need to quickly and holistically identify and address data quality problems like irrelevant images, highly similar images, or mislabeled images within your large image datasets.
Not ideal if you are looking for a manual image annotation tool or a solution for text-based data cleaning.
Stars
36
Forks
2
Language
Python
License
—
Category
Last pushed
Oct 14, 2025
Commits (30d)
0
Dependencies
24
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Digital-Dermatology/SelfClean"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
skrub-data/skrub
Machine learning with dataframes
biolab/orange3
π :bar_chart: :bulb: Orange: Interactive data analysis
root-project/root
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and...
drivendataorg/deon
A command line tool to easily add an ethics checklist to your data science projects.