Data-Centric-AI-Community/ydata-quality

Data Quality assessment with one line of code

/ 100

Established

This tool helps data professionals quickly check the quality of their datasets before using them for analysis or machine learning. You provide your raw or transformed dataset, and it automatically flags issues like duplicate entries, highly correlated features, missing values, or erroneous data. It's designed for data scientists, machine learning engineers, and data analysts who need to ensure data reliability.

454 stars.

Use this if you need a quick, comprehensive overview of potential quality problems in your tabular datasets that could impact downstream analysis or model performance.

Not ideal if you need to build custom, complex data validation rules that go beyond standard quality checks, or if you're not comfortable working with Python.

data-quality-assurance data-preparation machine-learning-engineering data-analysis data-governance

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

454

Forks

Language

Jupyter Notebook

License

MIT

Related frameworks

skrub-data/skrub

Machine learning with dataframes

biolab/orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

root-project/root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

cleanlab/cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and...

drivendataorg/deon

A command line tool to easily add an ethics checklist to your data science projects.

Explore ML Frameworks

All categories Trending ML Framework directory Insights