spotify/pythonflow

:snake: Dataflow programming for python.

55
/ 100
Established

This tool helps machine learning engineers and data scientists build and manage complex data preprocessing pipelines. You define a series of steps for transforming raw data, and Pythonflow takes that definition to produce cleaned, prepared datasets ready for model training. It's designed for anyone preparing large datasets for machine learning applications.

292 stars. No commits in the last 6 months. Available on PyPI.

Use this if you are building machine learning models and need an efficient way to define, debug, and execute multi-step data preparation workflows, especially when dealing with computationally intensive tasks or distributed processing.

Not ideal if you are looking for a general-purpose data transformation tool for simpler analytics or ETL processes outside of machine learning, or if you prefer visual pipeline builders over code-based definitions.

machine-learning-engineering data-preprocessing ML-pipelines distributed-computing data-science
Stale 6m No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

292

Forks

47

Language

Python

License

Apache-2.0

Last pushed

May 23, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/spotify/pythonflow"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.