SJTU-DMTai/awesome-ml-data-quality-papers

Papers about training data quality management for ML models.

28
/ 100
Experimental

This resource provides a curated list of research papers focused on improving the quality of training data for machine learning models. It helps data scientists understand and implement strategies to refine their datasets, ultimately leading to more robust and reliable AI systems. You'll find research on identifying problematic data, assessing its impact, and techniques for data selection and debugging to enhance model performance, fairness, and robustness.

112 stars. No commits in the last 6 months.

Use this if you are a data scientist regularly building and deploying machine learning models and frequently encounter issues with model performance or unexpected behavior due to the quality of your training data.

Not ideal if you are looking for ready-to-use software tools or libraries for immediate implementation rather than academic research and theoretical foundations.

Machine Learning Engineering Data Science ML Model Debugging Data Quality Management AI System Development
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 9 / 25

How are scores calculated?

Stars

112

Forks

7

Language

License

Last pushed

Oct 15, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/SJTU-DMTai/awesome-ml-data-quality-papers"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.