rhiever/datacleaner
A Python tool that automatically cleans data sets and readies them for analysis.
This tool helps data analysts and scientists quickly prepare raw tabular datasets for further analysis. It takes your CSV or similar file, identifies common issues like missing values and text-based categories, and outputs a cleaned version where these issues are addressed, making it ready for statistical models or machine learning. It's designed for anyone who regularly works with structured data and needs to streamline their data preparation.
1,078 stars. No commits in the last 6 months. Available on PyPI.
Use this if you routinely deal with datasets containing missing values or non-numerical categorical features that need to be transformed for analysis.
Not ideal if your data is unstructured text, images, or requires complex domain-specific parsing before it can be represented in a table.
Stars
1,078
Forks
206
Language
Python
License
MIT
Category
Last pushed
May 22, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/rhiever/datacleaner"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
skrub-data/skrub
Machine learning with dataframes
biolab/orange3
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
root-project/root
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and...
drivendataorg/deon
A command line tool to easily add an ethics checklist to your data science projects.