pyjanitor-devs/pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
This tool helps data analysts and scientists transform messy raw datasets into clean, usable formats for analysis or modeling. It takes common tabular data, like spreadsheets or database exports, and processes it by renaming columns, handling missing values, or restructuring information to produce a tidied dataset. Anyone working with data that requires preparation before it can be used effectively will find this project beneficial.
1,484 stars. Actively maintained with 10 commits in the last 30 days. Available on PyPI.
Use this if you regularly spend significant time manually cleaning and preparing data using pandas, and want a more efficient, readable, and consistent way to perform common data cleaning tasks.
Not ideal if you primarily work with data that is already perfectly structured and clean, or if you prefer to build all your data manipulation logic from scratch without relying on extended libraries.
Stars
1,484
Forks
182
Language
Python
License
MIT
Category
Last pushed
Mar 15, 2026
Commits (30d)
10
Dependencies
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/pyjanitor-devs/pyjanitor"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics
koopjs/koop
Transform, query, and download geospatial data on the web.
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.