AutoViML/pandas_dq

Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.

50
/ 100
Established

This tool helps data analysts and scientists quickly identify and fix common issues in their datasets, ensuring data is clean and reliable for analysis or machine learning. You input a raw dataset, and it provides detailed reports on data quality problems like missing values, outliers, and duplicates, then outputs a cleaned, ready-to-use dataset. It's ideal for anyone who regularly works with tabular data and needs to prepare it for further steps.

135 stars. Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Use this if you need to quickly assess the quality of your tabular data and automatically clean it to improve your data analysis or machine learning model performance.

Not ideal if you need to work with highly specialized data formats beyond standard tables or require deeply manual, fine-grained control over every data cleaning step.

data-preparation data-cleaning data-quality-assessment exploratory-data-analysis machine-learning-engineering
Stale 6m
Maintenance 0 / 25
Adoption 11 / 25
Maturity 25 / 25
Community 14 / 25

How are scores calculated?

Stars

135

Forks

15

Language

Python

License

Apache-2.0

Last pushed

Dec 13, 2023

Commits (30d)

0

Dependencies

3

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/AutoViML/pandas_dq"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.