HoloClean/holoclean

A Machine Learning System for Data Enrichment.

51
/ 100
Established

HoloClean is a system designed for data practitioners and scientists who need to ensure their datasets are accurate and complete. It takes in raw, potentially messy data along with any existing quality rules or reference information. HoloClean then identifies and corrects errors, fills in missing values, and enriches the data, providing a clean, complete dataset ready for reliable analysis or machine learning tasks.

533 stars. No commits in the last 6 months.

Use this if you spend significant time manually cleaning and preparing large datasets and want to automate this process using a machine learning-driven approach.

Not ideal if you need a simple, rule-based data cleaning tool for small datasets or if you are not comfortable with PostgreSQL and Python environments.

data-quality data-preparation data-enrichment data-curation data-science-workflow
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

533

Forks

131

Language

Python

License

Apache-2.0

Last pushed

Jul 20, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/HoloClean/holoclean"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.