spotify/pythonflow
:snake: Dataflow programming for python.
This tool helps machine learning engineers and data scientists build and manage complex data preprocessing pipelines. You define a series of steps for transforming raw data, and Pythonflow takes that definition to produce cleaned, prepared datasets ready for model training. It's designed for anyone preparing large datasets for machine learning applications.
292 stars. No commits in the last 6 months. Available on PyPI.
Use this if you are building machine learning models and need an efficient way to define, debug, and execute multi-step data preparation workflows, especially when dealing with computationally intensive tasks or distributed processing.
Not ideal if you are looking for a general-purpose data transformation tool for simpler analytics or ETL processes outside of machine learning, or if you prefer visual pipeline builders over code-based definitions.
Stars
292
Forks
47
Language
Python
License
Apache-2.0
Category
Last pushed
May 23, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/spotify/pythonflow"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference...
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
bsc-wdc/dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.