ELHoussineT/AutoDataCleaner
Simple and automatic data cleaning in one line of code! It performs one-hot encoding, date & time casting to datetime dtype, detects binary columns, safely convert non-numeric columns to numeric dtypes, cleaning dirty/empty values, normalizing values and removing unwanted columns all in one line of code. Get your data ready for model training and fitting quickly.
This project helps data analysts and scientists quickly prepare their raw dataset for machine learning model training. It takes your messy tabular data (like a Pandas DataFrame) with mixed text, numbers, and dates, and automatically transforms it into a clean, structured format, handling missing values, encoding categories, and standardizing numerical scales. The output is a ready-to-use dataset, significantly reducing the manual effort of data preprocessing.
No commits in the last 6 months.
Use this if you need to rapidly clean and preprocess a dataset for machine learning without writing extensive data manipulation code.
Not ideal if you require highly custom, granular control over each data cleaning step or need to apply specific, non-standard preprocessing techniques.
Stars
20
Forks
4
Language
Python
License
—
Category
Last pushed
May 22, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/ELHoussineT/AutoDataCleaner"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
skrub-data/skrub
Machine learning with dataframes
biolab/orange3
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
root-project/root
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and...
drivendataorg/deon
A command line tool to easily add an ethics checklist to your data science projects.