seedatnabeel/Data-SUITE
Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)
Data-SUITE helps data scientists or machine learning engineers understand the limitations of their training data and identify unreliable predictions. It takes your existing dataset and a trained model, then it tells you which new data points might not be reliably handled by your model, and it maps out the areas where your training data is sparse or unusual. This helps you know when to trust your model's outputs and where to focus efforts on collecting more relevant data.
No commits in the last 6 months.
Use this if you need to determine which new data instances your trained model can reliably predict and to understand where your existing training data might have gaps or inconsistencies.
Not ideal if you are looking for a tool to automatically improve your model's performance without needing to understand the underlying data limitations.
Stars
9
Forks
3
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Mar 08, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/seedatnabeel/Data-SUITE"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
skrub-data/skrub
Machine learning with dataframes
biolab/orange3
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
root-project/root
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and...
drivendataorg/deon
A command line tool to easily add an ethics checklist to your data science projects.