zaratsian/Spark

Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References

/ 100

Emerging

This collection provides practical Apache Spark code snippets and scripts to help data engineers and data scientists efficiently process and analyze large datasets. It includes examples primarily in PySpark, along with Scala and SparkR, to streamline big data workflows. Users can find code solutions to common Spark challenges and leverage them in their data processing tasks.

No commits in the last 6 months.

Use this if you are a data engineer or data scientist looking for ready-to-use Spark code examples to jumpstart your big data projects or troubleshoot specific issues.

Not ideal if you are new to Spark and seeking a comprehensive introductory tutorial or a conceptual guide, as this repository focuses on practical code rather than foundational learning.

big-data-processing data-engineering data-analysis machine-learning-engineering distributed-computing

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 21 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

dipanjanS/text-analytics-with-python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment...

jonathandunn/text_analytics

Basic text analytics and natural language processing in Python

IBM/watson-document-co-relation

Correlate text content across documents using Watson NLU, Python NLTK and Watson Studio.

Clarifai/clarifai-pyspark

Interfaces for Unstructured data and ML pipelines with Databricks and Clarifai

umer7/Applied-Text-Mining-in-Python

Repo for Applied Text Mining in Python (coursera) by University of Michigan

Explore NLP Tools

All categories Trending NLP directory Insights