skrub-data/skrub

Machine learning with dataframes

78
/ 100
Verified

This tool helps data professionals prepare messy, real-world dataframes for machine learning tasks. It takes your raw tabular data, often containing inconsistent text or categories, and transforms it into a clean, structured format ready for analysis. Data scientists, machine learning engineers, and data analysts will find this useful for streamlining their data preparation.

1,578 stars. Used by 1 other package. Actively maintained with 40 commits in the last 30 days. Available on PyPI.

Use this if you regularly work with dataframes that have 'dirty' categorical features, typos, or inconsistent text entries and need to quickly clean and encode them for machine learning models.

Not ideal if your primary task involves image, audio, or unstructured text data, as this tool is specifically designed for tabular dataframes.

data-preparation feature-engineering data-cleaning machine-learning-workflows tabular-data
Maintenance 20 / 25
Adoption 11 / 25
Maturity 25 / 25
Community 22 / 25

How are scores calculated?

Stars

1,578

Forks

209

Language

Python

License

BSD-3-Clause

Last pushed

Mar 10, 2026

Commits (30d)

40

Dependencies

8

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/skrub-data/skrub"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.