Data-Centric-AI-Community/ydata-quality
Data Quality assessment with one line of code
This tool helps data professionals quickly check the quality of their datasets before using them for analysis or machine learning. You provide your raw or transformed dataset, and it automatically flags issues like duplicate entries, highly correlated features, missing values, or erroneous data. It's designed for data scientists, machine learning engineers, and data analysts who need to ensure data reliability.
454 stars.
Use this if you need a quick, comprehensive overview of potential quality problems in your tabular datasets that could impact downstream analysis or model performance.
Not ideal if you need to build custom, complex data validation rules that go beyond standard quality checks, or if you're not comfortable working with Python.
Stars
454
Forks
56
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Mar 02, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Data-Centric-AI-Community/ydata-quality"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
skrub-data/skrub
Machine learning with dataframes
biolab/orange3
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
root-project/root
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and...
drivendataorg/deon
A command line tool to easily add an ethics checklist to your data science projects.