adipolak/scaling-machine-learning-course
Scaling Machine Learning in Three Week course in a collaboration with O'Reilly following the guidance of Adi Polak's book - Scaling Machine Learning with Spark
This course material helps data scientists and machine learning engineers learn how to build and deploy scalable machine learning workflows. It takes you through using PySpark and MLflow to handle large datasets and manage experiments. You'll gain hands-on experience transforming raw data into reproducible, large-scale machine learning models.
No commits in the last 6 months.
Use this if you are a data scientist or ML engineer looking to move beyond small datasets and build robust, scalable machine learning solutions in a production environment.
Not ideal if you are looking for an introduction to machine learning concepts or prefer working with smaller datasets that don't require distributed computing.
Stars
24
Forks
17
Language
Jupyter Notebook
License
—
Category
Last pushed
May 12, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/adipolak/scaling-machine-learning-course"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
skrub-data/skrub
Machine learning with dataframes
biolab/orange3
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
root-project/root
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and...
drivendataorg/deon
A command line tool to easily add an ethics checklist to your data science projects.