tirthajyoti/Spark-with-Python

Fundamentals of Spark with Python (using PySpark), code examples

51
/ 100
Established

If you're a data professional, this project offers practical code examples and setup guidance for using Apache Spark with Python (PySpark). It helps you process vast amounts of data efficiently, providing a robust framework for big data analytics and machine learning. This is ideal for data scientists, data engineers, or machine learning engineers who need to work with large, distributed datasets.

362 stars. No commits in the last 6 months.

Use this if you need to perform data manipulation, analysis, or apply machine learning algorithms on very large datasets that don't fit into a single computer's memory.

Not ideal if your datasets are small enough to be handled by a single machine using tools like Pandas or scikit-learn, as the overhead of Spark might be unnecessary.

big-data-analytics distributed-computing machine-learning-engineering data-processing data-engineering
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

362

Forks

271

Language

Jupyter Notebook

License

MIT

Last pushed

Oct 29, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/tirthajyoti/Spark-with-Python"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.