zaratsian/Spark
Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References
This collection provides practical Apache Spark code snippets and scripts to help data engineers and data scientists efficiently process and analyze large datasets. It includes examples primarily in PySpark, along with Scala and SparkR, to streamline big data workflows. Users can find code solutions to common Spark challenges and leverage them in their data processing tasks.
No commits in the last 6 months.
Use this if you are a data engineer or data scientist looking for ready-to-use Spark code examples to jumpstart your big data projects or troubleshoot specific issues.
Not ideal if you are new to Spark and seeking a comprehensive introductory tutorial or a conceptual guide, as this repository focuses on practical code rather than foundational learning.
Stars
69
Forks
37
Language
Jupyter Notebook
License
—
Category
Last pushed
Jan 21, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/zaratsian/Spark"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dipanjanS/text-analytics-with-python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment...
jonathandunn/text_analytics
Basic text analytics and natural language processing in Python
IBM/watson-document-co-relation
Correlate text content across documents using Watson NLU, Python NLTK and Watson Studio.
Clarifai/clarifai-pyspark
Interfaces for Unstructured data and ML pipelines with Databricks and Clarifai
umer7/Applied-Text-Mining-in-Python
Repo for Applied Text Mining in Python (coursera) by University of Michigan