skrub-data/skrub
Machine learning with dataframes
This tool helps data professionals prepare messy, real-world dataframes for machine learning tasks. It takes your raw tabular data, often containing inconsistent text or categories, and transforms it into a clean, structured format ready for analysis. Data scientists, machine learning engineers, and data analysts will find this useful for streamlining their data preparation.
1,578 stars. Used by 1 other package. Actively maintained with 40 commits in the last 30 days. Available on PyPI.
Use this if you regularly work with dataframes that have 'dirty' categorical features, typos, or inconsistent text entries and need to quickly clean and encode them for machine learning models.
Not ideal if your primary task involves image, audio, or unstructured text data, as this tool is specifically designed for tabular dataframes.
Stars
1,578
Forks
209
Language
Python
License
BSD-3-Clause
Category
Last pushed
Mar 10, 2026
Commits (30d)
40
Dependencies
8
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/skrub-data/skrub"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
biolab/orange3
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
root-project/root
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and...
drivendataorg/deon
A command line tool to easily add an ethics checklist to your data science projects.
deepnote/deepnote
Deepnote is a drop-in replacement for Jupyter with an AI-first design, sleek UI, new blocks, and...