easeml/datascope

Measuring data importance over ML pipelines using the Shapley value.

/ 100

Emerging

This tool helps machine learning practitioners identify which training data points are most crucial for their model's accuracy and fairness. You input your existing training data and ML pipeline (like a scikit-learn pipeline), and it outputs a score for each data point indicating its importance. This helps data scientists and ML engineers efficiently pinpoint data quality issues, prioritize data cleaning, and make informed decisions about data acquisition.

No commits in the last 6 months.

Use this if you need to quickly understand which specific training examples are most impacting your ML model's performance to guide data cleaning or acquisition efforts.

Not ideal if you are working with extremely large datasets (millions of examples) and require real-time importance calculations, or if you are not using scikit-learn compatible pipelines.

Machine Learning Data Quality MLOps Data Debugging Model Performance

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

shap/shap

A game theoretic approach to explain the output of any machine learning model.

mmschlk/shapiq

Shapley Interactions and Shapley Values for Machine Learning

iancovert/sage

For calculating global feature importance using Shapley values.

predict-idlab/powershap

A power-full Shapley feature selection method.

aerdem4/lofo-importance

Leave One Feature Out Importance

Explore ML Frameworks

All categories Trending ML Framework directory Insights