MingChen0919/learning-apache-spark

Notes on Apache Spark (pyspark)

51
/ 100
Established

These notes help data professionals understand how to process and analyze very large datasets efficiently using Apache Spark. They cover common data manipulation and analysis tasks, showing how to transform raw data into actionable insights or cleaned datasets ready for further use. Data engineers, data scientists, and analysts working with big data will find this resource useful.

299 stars. No commits in the last 6 months.

Use this if you need to learn Apache Spark's PySpark API for big data processing and analysis.

Not ideal if you are looking for an in-depth guide on Apache Spark's Scala API or advanced distributed systems architecture.

big-data-processing data-engineering data-analysis data-science large-scale-etl
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

299

Forks

186

Language

HTML

License

MIT

Last pushed

Mar 03, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/MingChen0919/learning-apache-spark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.