getyourguide/DDataFlow
A tool to help you to test and develop pyspark code with sampled and local data
When building machine learning models or data pipelines using PySpark, this tool helps you develop and test your code more efficiently. It takes your full production data sources, samples them down for faster processing, and outputs results to a test location, preventing any accidental changes to live data. Data scientists and data engineers working with PySpark will find this useful for their daily development and testing workflows.
Available on PyPI.
Use this if you are a data scientist or engineer building PySpark-based machine learning or data pipelines and need a way to develop and test your code quickly and safely with realistic, sampled data.
Not ideal if you need to run tests against your full production dataset or if you are not working with PySpark for your data pipelines.
Stars
15
Forks
1
Language
HTML
License
Apache-2.0
Category
Last pushed
Feb 05, 2026
Commits (30d)
0
Dependencies
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/getyourguide/DDataFlow"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.