AutoViML/pandas_dq
Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.
This tool helps data analysts and scientists quickly identify and fix common issues in their datasets, ensuring data is clean and reliable for analysis or machine learning. You input a raw dataset, and it provides detailed reports on data quality problems like missing values, outliers, and duplicates, then outputs a cleaned, ready-to-use dataset. It's ideal for anyone who regularly works with tabular data and needs to prepare it for further steps.
135 stars. Used by 1 other package. No commits in the last 6 months. Available on PyPI.
Use this if you need to quickly assess the quality of your tabular data and automatically clean it to improve your data analysis or machine learning model performance.
Not ideal if you need to work with highly specialized data formats beyond standard tables or require deeply manual, fine-grained control over every data cleaning step.
Stars
135
Forks
15
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 13, 2023
Commits (30d)
0
Dependencies
3
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/AutoViML/pandas_dq"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
skrub-data/skrub
Machine learning with dataframes
biolab/orange3
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
root-project/root
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and...
drivendataorg/deon
A command line tool to easily add an ethics checklist to your data science projects.