learning-apache-spark and spark-py-notebooks
About learning-apache-spark
MingChen0919/learning-apache-spark
Notes on Apache Spark (pyspark)
These notes help data professionals understand how to process and analyze very large datasets efficiently using Apache Spark. They cover common data manipulation and analysis tasks, showing how to transform raw data into actionable insights or cleaned datasets ready for further use. Data engineers, data scientists, and analysts working with big data will find this resource useful.
About spark-py-notebooks
jadianes/spark-py-notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
This project provides step-by-step guides using Jupyter notebooks to help data scientists and big data engineers learn how to analyze large datasets and build machine learning models with Apache Spark and Python. It takes raw data, like network interaction logs, and shows you how to process, explore, and build predictive models for tasks such as anomaly detection or recommendation engines. This is for professionals who need to work with massive datasets and leverage Spark's distributed computing power.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work