tirthajyoti/Spark-with-Python

Fundamentals of Spark with Python (using PySpark), code examples

/ 100

Established

If you're a data professional, this project offers practical code examples and setup guidance for using Apache Spark with Python (PySpark). It helps you process vast amounts of data efficiently, providing a robust framework for big data analytics and machine learning. This is ideal for data scientists, data engineers, or machine learning engineers who need to work with large, distributed datasets.

362 stars. No commits in the last 6 months.

Use this if you need to perform data manipulation, analysis, or apply machine learning algorithms on very large datasets that don't fit into a single computer's memory.

Not ideal if your datasets are small enough to be handled by a single machine using tools like Pandas or scikit-learn, as the overhead of Spark might be unnecessary.

big-data-analytics distributed-computing machine-learning-engineering data-processing data-engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

362

Forks

271

Language

Jupyter Notebook

License

MIT

Compare

Spark-with-Python and learning-apache-spark Spark-with-Python and spark-py-notebooks

Related frameworks

lensacom/sparkit-learn

PySpark + Scikit-learn = Sparkit-learn

Angel-ML/angel

A Flexible and Powerful Parameter Server for large-scale machine learning

flink-extended/dl-on-flink

Deep Learning on Flink aims to integrate Flink and deep learning frameworks (e.g. TensorFlow,...

MingChen0919/learning-apache-spark

Notes on Apache Spark (pyspark)

mahmoudparsian/data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Explore ML Frameworks

All categories Trending ML Framework directory Insights