manuparra/taller_SparkR
Taller SparkR para las Jornadas de Usuarios de R
This workshop material helps data analysts and scientists process extremely large datasets using R and Apache Spark. It takes raw, massive datasets (like CSV, JSON, Parquet files) and shows you how to filter, aggregate, transform, and analyze them to produce insights, machine learning models, and visualizations. The material is designed for someone who works with data in R and needs to scale up to 'big data' problems.
No commits in the last 6 months.
Use this if you are an R user struggling to analyze very large datasets that exceed the memory capacity of a single machine and need to leverage distributed computing.
Not ideal if you primarily work with smaller datasets that fit within your computer's memory or if you prefer programming languages other than R.
Stars
13
Forks
17
Language
HTML
License
—
Category
Last pushed
Nov 21, 2016
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/manuparra/taller_SparkR"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
lensacom/sparkit-learn
PySpark + Scikit-learn = Sparkit-learn
Angel-ML/angel
A Flexible and Powerful Parameter Server for large-scale machine learning
flink-extended/dl-on-flink
Deep Learning on Flink aims to integrate Flink and deep learning frameworks (e.g. TensorFlow,...
tirthajyoti/Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
jadianes/spark-py-notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython...