easeml/datascope

Measuring data importance over ML pipelines using the Shapley value.

39
/ 100
Emerging

This tool helps machine learning practitioners identify which training data points are most crucial for their model's accuracy and fairness. You input your existing training data and ML pipeline (like a scikit-learn pipeline), and it outputs a score for each data point indicating its importance. This helps data scientists and ML engineers efficiently pinpoint data quality issues, prioritize data cleaning, and make informed decisions about data acquisition.

No commits in the last 6 months.

Use this if you need to quickly understand which specific training examples are most impacting your ML model's performance to guide data cleaning or acquisition efforts.

Not ideal if you are working with extremely large datasets (millions of examples) and require real-time importance calculations, or if you are not using scikit-learn compatible pipelines.

Machine Learning Data Quality MLOps Data Debugging Model Performance
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

45

Forks

6

Language

Python

License

MIT

Last pushed

Aug 26, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/easeml/datascope"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.