SparkLearning

Spark guide

A comprehensive resource for learning Apache Spark, covering its core concepts, components, and advanced topics.

A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.

GitHub

649 stars
19 watching
73 forks
last commit: over 2 years ago
Linked from 1 awesome list

big-datapysparkspark

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
sparklyr/sparklyr An R interface to Apache Spark for distributed data analysis and machine learning 957
apache/spark An analytics engine designed to handle large-scale data processing and analysis 40,002
dotnet/spark Provides high-performance APIs for using Apache Spark with .NET 2,026
tweag/sparkle A tool for creating resilient, scalable analytics applications with Haskell on top of Apache Spark 447
tubular/sparkly A set of Python libraries and tools to simplify interactions with various data sources using Apache Spark. 60
amplab-extras/sparkr-pkg Provides a lightweight R interface to Apache Spark for data processing 641
kotlin/kotlin-spark-api Provides compatibility and extensions between Kotlin and Apache Spark for big data processing 463
gorillalabs/sparkling A Clojure API for interacting with Apache Spark 448
lensacom/sparkit-learn A Python library that integrates PySpark and scikit-learn for distributed machine learning 1,155
sw1sh/frege-spark An effort to integrate Apache Spark with the Frege programming language 5
tkych/cl-spark A utility for generating simple, visually appealing data visualizations from numeric data sets. 96
dmmiller612/sparktorch A PyTorch implementation on Apache Spark for distributed deep learning model training and inference. 339
nchammas/flintrock A command-line tool for launching and managing Apache Spark clusters on AWS 638
ondra-m/ruby-spark A Ruby wrapper around Apache Spark's functionality for large-scale data processing 227
sorenmacbeth/flambo A Clojure-based interface to Apache Spark, enabling efficient data processing and manipulation in cluster computing environments. 606