elisim/hydra-sklearn-pipelines

Code accompanying the blogpost: "Creating Configurable Data Pre-Processing Pipelines by Combining Hydra and Sklearn" by Eli Simhayev & Benjamin Bodner

/ 100

Experimental

This project helps machine learning engineers and data scientists quickly configure and run different data preprocessing workflows. It takes raw or structured datasets and applies a sequence of cleaning, transformation, and feature engineering steps, producing a ready-to-model dataset. This is ideal for practitioners who need to experiment with various data preparation strategies before training a machine learning model.

No commits in the last 6 months.

Use this if you are a machine learning engineer or data scientist who needs a structured and repeatable way to define and execute different data preprocessing pipelines for your experiments.

Not ideal if you are looking for a general-purpose data transformation tool for business intelligence or simple data cleaning that doesn't involve machine learning.

data-preprocessing machine-learning-engineering data-science-workflows feature-engineering model-preparation

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

scverse/anndata

Annotated data.

koaning/scikit-lego

Extra blocks for scikit-learn pipelines.

googleapis/python-bigquery-dataframes

BigQuery DataFrames (also known as BigFrames)

bigmlcom/python

Python bindings for BigML.io

posit-dev/orbital

Turn SciKitLearn pipelines into SQL

Explore ML Frameworks

All categories Trending ML Framework directory Insights