datacleaner/DataCleaner
The premier open source Data Quality solution
This tool helps businesses, analysts, and data professionals ensure their data is accurate and reliable. You input raw, messy datasets, and it helps you identify inconsistencies, correct errors, and enrich information to produce clean, high-quality data. It's used by anyone who needs to trust their data for reporting, analysis, or operational processes.
647 stars. Actively maintained with 5 commits in the last 30 days.
Use this if you need a versatile solution for ad-hoc data analysis, recurring data cleansing tasks, or managing master data effectively.
Not ideal if you require active, ongoing feature development or a project with a very large, rapidly growing community.
Stars
647
Forks
183
Language
Java
License
LGPL-3.0
Category
Last pushed
Mar 14, 2026
Commits (30d)
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/datacleaner/DataCleaner"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics
koopjs/koop
Transform, query, and download geospatial data on the web.
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.