tirthajyoti/Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
If you're a data professional, this project offers practical code examples and setup guidance for using Apache Spark with Python (PySpark). It helps you process vast amounts of data efficiently, providing a robust framework for big data analytics and machine learning. This is ideal for data scientists, data engineers, or machine learning engineers who need to work with large, distributed datasets.
362 stars. No commits in the last 6 months.
Use this if you need to perform data manipulation, analysis, or apply machine learning algorithms on very large datasets that don't fit into a single computer's memory.
Not ideal if your datasets are small enough to be handled by a single machine using tools like Pandas or scikit-learn, as the overhead of Spark might be unnecessary.
Stars
362
Forks
271
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Oct 29, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/tirthajyoti/Spark-with-Python"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
lensacom/sparkit-learn
PySpark + Scikit-learn = Sparkit-learn
Angel-ML/angel
A Flexible and Powerful Parameter Server for large-scale machine learning
flink-extended/dl-on-flink
Deep Learning on Flink aims to integrate Flink and deep learning frameworks (e.g. TensorFlow,...
MingChen0919/learning-apache-spark
Notes on Apache Spark (pyspark)
mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book