easeml/datascope
Measuring data importance over ML pipelines using the Shapley value.
This tool helps machine learning practitioners identify which training data points are most crucial for their model's accuracy and fairness. You input your existing training data and ML pipeline (like a scikit-learn pipeline), and it outputs a score for each data point indicating its importance. This helps data scientists and ML engineers efficiently pinpoint data quality issues, prioritize data cleaning, and make informed decisions about data acquisition.
No commits in the last 6 months.
Use this if you need to quickly understand which specific training examples are most impacting your ML model's performance to guide data cleaning or acquisition efforts.
Not ideal if you are working with extremely large datasets (millions of examples) and require real-time importance calculations, or if you are not using scikit-learn compatible pipelines.
Stars
45
Forks
6
Language
Python
License
MIT
Category
Last pushed
Aug 26, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/easeml/datascope"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
shap/shap
A game theoretic approach to explain the output of any machine learning model.
mmschlk/shapiq
Shapley Interactions and Shapley Values for Machine Learning
iancovert/sage
For calculating global feature importance using Shapley values.
predict-idlab/powershap
A power-full Shapley feature selection method.
aerdem4/lofo-importance
Leave One Feature Out Importance