spark-py-notebooks and Spark-with-Python

Both are tutorial repositories for learning PySpark, making them primarily competitors in the "fundamentals of Spark with Python" niche, though a learner might use elements from both to gain a broader understanding.

spark-py-notebooks
51
Established
Spark-with-Python
51
Established
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 25/25
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 25/25
Stars: 1,663
Forks: 911
Downloads:
Commits (30d): 0
Language: Jupyter Notebook
License:
Stars: 362
Forks: 271
Downloads:
Commits (30d): 0
Language: Jupyter Notebook
License: MIT
Stale 6m No Package No Dependents
Stale 6m No Package No Dependents

About spark-py-notebooks

jadianes/spark-py-notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

This project provides step-by-step guides using Jupyter notebooks to help data scientists and big data engineers learn how to analyze large datasets and build machine learning models with Apache Spark and Python. It takes raw data, like network interaction logs, and shows you how to process, explore, and build predictive models for tasks such as anomaly detection or recommendation engines. This is for professionals who need to work with massive datasets and leverage Spark's distributed computing power.

Big Data Analysis Machine Learning Data Science Training Distributed Computing Predictive Modeling

About Spark-with-Python

tirthajyoti/Spark-with-Python

Fundamentals of Spark with Python (using PySpark), code examples

If you're a data professional, this project offers practical code examples and setup guidance for using Apache Spark with Python (PySpark). It helps you process vast amounts of data efficiently, providing a robust framework for big data analytics and machine learning. This is ideal for data scientists, data engineers, or machine learning engineers who need to work with large, distributed datasets.

big-data-analytics distributed-computing machine-learning-engineering data-processing data-engineering

Scores updated daily from GitHub, PyPI, and npm data. How scores work