AutoViML/pandas_dq

Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.

/ 100

Established

This tool helps data analysts and scientists quickly identify and fix common issues in their datasets, ensuring data is clean and reliable for analysis or machine learning. You input a raw dataset, and it provides detailed reports on data quality problems like missing values, outliers, and duplicates, then outputs a cleaned, ready-to-use dataset. It's ideal for anyone who regularly works with tabular data and needs to prepare it for further steps.

135 stars. Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Use this if you need to quickly assess the quality of your tabular data and automatically clean it to improve your data analysis or machine learning model performance.

Not ideal if you need to work with highly specialized data formats beyond standard tables or require deeply manual, fine-grained control over every data cleaning step.

data-preparation data-cleaning data-quality-assessment exploratory-data-analysis machine-learning-engineering

Stale 6m

Maintenance 0 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 14 / 25

How are scores calculated?

Stars

135

Forks

Language

Python

License

Apache-2.0

Related frameworks

skrub-data/skrub

Machine learning with dataframes

biolab/orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

root-project/root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

cleanlab/cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and...

drivendataorg/deon

A command line tool to easily add an ethics checklist to your data science projects.

Explore ML Frameworks

All categories Trending ML Framework directory Insights