HoloClean/holoclean
A Machine Learning System for Data Enrichment.
HoloClean is a system designed for data practitioners and scientists who need to ensure their datasets are accurate and complete. It takes in raw, potentially messy data along with any existing quality rules or reference information. HoloClean then identifies and corrects errors, fills in missing values, and enriches the data, providing a clean, complete dataset ready for reliable analysis or machine learning tasks.
533 stars. No commits in the last 6 months.
Use this if you spend significant time manually cleaning and preparing large datasets and want to automate this process using a machine learning-driven approach.
Not ideal if you need a simple, rule-based data cleaning tool for small datasets or if you are not comfortable with PostgreSQL and Python environments.
Stars
533
Forks
131
Language
Python
License
Apache-2.0
Category
Last pushed
Jul 20, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/HoloClean/holoclean"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
treeverse/dvc
🦉 Data Versioning and ML Experiments
runpod/runpod-python
🐍 | Python library for RunPod API and serverless worker SDK.
microsoft/vscode-jupyter
VS Code Jupyter extension
4paradigm/OpenMLDB
OpenMLDB is an open-source machine learning database that provides a feature platform computing...
uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning...