SparkLearning

Spark guide

A comprehensive resource for learning Apache Spark, covering its core concepts, components, and advanced topics.

A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.

GitHub

655 stars

19 watching

74 forks

last commit: over 4 years ago

Linked from 1 awesome list

big-datapysparkspark

Backlinks from these awesome lists:

dopplerhq/awesome-interview-questions

Related projects:

Repository	Description	Stars
sparklyr/sparklyr	An R interface to Apache Spark for distributed data analysis and machine learning	955
apache/spark	An analytics engine designed to handle large-scale data processing and analysis	40,170
dotnet/spark	Provides high-performance APIs for using Apache Spark with .NET	2,032
tweag/sparkle	A tool for creating resilient, scalable analytics applications with Haskell on top of Apache Spark	447
tubular/sparkly	A set of Python libraries and tools to simplify interactions with various data sources using Apache Spark.	61
amplab-extras/sparkr-pkg	Provides a lightweight R interface to Apache Spark for data processing	641
kotlin/kotlin-spark-api	Provides compatibility and extensions between Kotlin and Apache Spark for big data processing	463
gorillalabs/sparkling	A Clojure API for interacting with Apache Spark	448
lensacom/sparkit-learn	A Python library that integrates PySpark and scikit-learn for distributed machine learning	1,154
sw1sh/frege-spark	An effort to integrate Apache Spark with the Frege programming language	5
tkych/cl-spark	A utility for generating simple, visually appealing data visualizations from numeric data sets.	96
dmmiller612/sparktorch	A PyTorch implementation on Apache Spark for distributed deep learning model training and inference.	339
nchammas/flintrock	A command-line tool for launching and managing Apache Spark clusters on AWS	637
ondra-m/ruby-spark	A Ruby wrapper around Apache Spark's functionality for large-scale data processing	227
sorenmacbeth/flambo	A Clojure-based interface to Apache Spark, enabling efficient data processing and manipulation in cluster computing environments.	606