awesome-spark

Spark toolkit

A curated collection of packages and resources for working with Apache Spark, an open-source cluster-computing framework.

A curated list of awesome Apache Spark packages and resources.

GitHub

2k stars

85 watching

331 forks

Language: Shell

last commit: 9 months ago

Linked from 3 awesome lists

apache-sparkawesomepysparksparkr

Awesome Spark / Packages / Language Bindings
Kotlin for Apache Spark	463	about 1 year ago	Kotlin API bindings and extensions
.NET for Apache Spark	2,032	7 months ago	.NET bindings
sparklyr	955	9 months ago	An alternative R backend, using
sparkle	447	over 2 years ago	Haskell on Apache Spark
spark-connect-rs	91	9 months ago	Rust bindings
spark-connect-go	168	9 months ago	Golang bindings
spark-connect-csharp	1	over 1 year ago	C# bindings
Awesome Spark / Packages / Notebooks and IDEs
almond			A scala kernel for
Apache Zeppelin			Web-based notebook that enables interactive data analytics with plugable backends, integrated plotting, and extensive Spark support out-of-the-box
Polynote			Polynote: an IDE-inspired polyglot notebook. It supports mixing multiple languages in one notebook, and sharing data between them seamlessly. It encourages reproducible notebooks with its immutable data model. Originating from
sparkmagic	1,334	8 months ago	magics and kernels for working with remote Spark clusters, for interactively working with remote Spark clusters through , in Jupyter notebooks
Awesome Spark / Packages / General Purpose Libraries
itachi	56	almost 2 years ago	A library that brings useful functions from modern database management systems to Apache Spark
spark-daria	754	9 months ago	A Scala library with essential Spark functions and extensions to make you more productive
quinn	651	8 months ago	A native PySpark implementation of spark-daria
Apache DataFu	119	8 months ago	A library of general purpose functions and UDF's
Joblib Apache Spark Backend	243	11 months ago	backend for running tasks on Spark clusters
Awesome Spark / Packages / SQL Data Sources
Spark XML	504	12 months ago	XML parser and writer
Spark Cassandra Connector	1,944	11 months ago	Cassandra support including data source and API and support for arbitrary queries
Mongo-Spark	713	11 months ago	Official MongoDB connector
Awesome Spark / Packages / Storage
Delta Lake	7,677	7 months ago	Storage layer with ACID transactions
Apache Hudi	5,498	7 months ago	Upserts, Deletes And Incremental Processing on Big Data
Apache Iceberg	6,621	7 months ago	Upserts, Deletes And Incremental Processing on Big Data
lakeFS			Integration with the lakeFS atomic versioned storage layer
Awesome Spark / Packages / Bioinformatics
ADAM	1,005	8 months ago	Set of tools designed to analyse genomics data
Hail	984	7 months ago	Genetic analysis framework
Awesome Spark / Packages / GIS
Apache Sedona	1,974	7 months ago	Cluster computing system for processing large-scale spatial data
Awesome Spark / Packages / Graph Processing
GraphFrames	1,007	8 months ago	Data frame based graph API
neo4j-spark-connector	313	8 months ago	Bolt protocol based, Neo4j Connector with RDD, DataFrame and GraphX / GraphFrames support
Awesome Spark / Packages / Machine Learning Extension
Apache SystemML			Declarative machine learning framework on top of Spark
Mahout Spark Bindings			[status unknown] - linear algebra DSL and optimizer with R-like syntax
KeystoneML			Type safe machine learning pipelines with RDDs
JPMML-Spark	94	over 3 years ago	PMML transformer library for Spark ML
ModelDB			A system to manage machine learning models for and
Sparkling Water	968	8 months ago	interoperability layer
BigDL	6,801	7 months ago	Distributed Deep Learning library
MLeap	1,506	8 months ago	Execution engine and serialization format which supports deployment of models without dependency on
Microsoft ML for Apache Spark	5,083	8 months ago	A distributed ml library with support for LightGBM, Vowpal Wabbit, OpenCV, Deep Learning, Cognitive Services, and Model Deployment
MLflow			Machine learning orchestration platform
Awesome Spark / Packages / Middleware
Livy	894	8 months ago	REST server with extensive language support (Python, R, Scala), ability to maintain interactive sessions and object sharing
spark-jobserver	2,839	7 months ago	Simple Spark as a Service which supports objects sharing using so called named objects. JVM only
Apache Toree	740	9 months ago	IPython protocol based middleware for interactive applications
Apache Kyuubi	2,116	7 months ago	A distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark
Awesome Spark / Packages / Monitoring
Data Mechanics Delight	344	about 1 year ago	Cross-platform monitoring tool (Spark UI / Spark History Server replacement)
Awesome Spark / Packages / Utilities
sparkly	61	about 2 years ago	Helpers & syntactic sugar for PySpark
Flintrock	637	7 months ago	A command-line tool for launching Spark clusters on EC2
Optimus	1,486	8 months ago	Data Cleansing and Exploration utilities with the goal of simplifying data cleaning
Awesome Spark / Packages / Natural Language Processing
spark-nlp	3,889	7 months ago	Natural language processing library built on top of Apache Spark ML
Awesome Spark / Packages / Streaming
Apache Bahir			Collection of the streaming connectors excluded from Spark 2.0 (Akka, MQTT, Twitter. ZeroMQ)
Awesome Spark / Packages / Interfaces
Apache Beam			Unified data processing engine supporting both batch and streaming applications. Apache Spark is one of the supported execution environments
Koalas	3,343	over 1 year ago	Pandas DataFrame API on top of Apache Spark
Awesome Spark / Packages / Data quality
deequ	3,324	10 months ago	Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets
python-deequ	734	9 months ago	Python API for Deequ
Awesome Spark / Packages / Testing
spark-testing-base	1,525	9 months ago	Collection of base test classes
spark-fast-tests	437	8 months ago	A lightweight and fast testing framework
chispa	632	9 months ago	PySpark test helpers with beautiful error messages
Awesome Spark / Packages / Web Archives
Archives Unleashed Toolkit	138	over 1 year ago	Open-source toolkit for analyzing web archives
Awesome Spark / Packages / Workflow Management
Cromwell	1,004	7 months ago	Workflow management system with
Awesome Spark / Resources / Books
Learning Spark, 2nd Edition			Introduction to Spark API with Spark 3.0 covered. Good source of knowledge about basic concepts
Advanced Analytics with Spark			Useful collection of Spark processing patterns. Accompanying GitHub repository:
Mastering Apache Spark			Interesting compilation of notes by . Focused on different aspects of Spark internals
Spark in Action			New book in the Manning's "in action" family with +400 pages. Starts gently, step-by-step and covers large number of topics. Free excerpt on how to and how to bootstrap a new application using the provided Maven Archetype. You can find the accompanying GitHub repo
Awesome Spark / Resources / Papers
Large-Scale Intelligent Microservices			Microsoft paper that presents an Apache Spark-based micro-service orchestration framework that extends database operations to include web service primitives
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing			Paper introducing a core distributed memory abstraction
Spark SQL: Relational Data Processing in Spark			Paper introducing relational underpinnings, code generation and Catalyst optimizer
Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark			Structured Streaming is a new high-level streaming API, it is a declarative API based on automatically incrementalizing a static relational query
Awesome Spark / Resources / MOOCS
Data Science and Engineering with Apache Spark (edX XSeries)			Series of five courses ( , , , , ) covering different aspects of software engineering and data science. Python oriented
Big Data Analysis with Scala and Spark (Coursera)			Scala oriented introductory course. Part of
Awesome Spark / Resources / Workshops
AMP Camp			Periodical training event organized by the . A source of useful exercise and recorded workshops covering different tools from the
Awesome Spark / Resources / Projects Using Spark
Oryx 2	1,787	almost 4 years ago	platform built on Apache Spark and with specialization for real-time large scale machine learning
Photon ML	793	almost 4 years ago	A machine learning library supporting classical Generalized Mixed Model and Generalized Additive Mixed Effect Model
PredictionIO			Machine Learning server for developers and data scientists to build and deploy predictive applications in a fraction of the time
Crossdata	169	over 5 years ago	Data integration platform with extended DataSource API and multi-user environment
Awesome Spark / Resources / Docker Images
apache/spark			Apache Spark Official Docker images
jupyter/docker-stacks/pyspark-notebook	8,037	8 months ago	PySpark with Jupyter Notebook and Mesos client
sequenceiq/docker-spark	765	over 4 years ago	Yarn images from
datamechanics/spark			An easy to setup Docker image for Apache Spark from
Awesome Spark / Resources / Miscellaneous
Spark with Scala Gitter channel			" " started by
Apache Spark User List			and - Mailing lists dedicated to usage questions and development topics respectively

awesome-spark

Awesome Spark / Packages / Language Bindings

Awesome Spark / Packages / Notebooks and IDEs

Awesome Spark / Packages / General Purpose Libraries

Awesome Spark / Packages / SQL Data Sources

Awesome Spark / Packages / Storage

Awesome Spark / Packages / Bioinformatics

Awesome Spark / Packages / GIS

Awesome Spark / Packages / Graph Processing

Awesome Spark / Packages / Machine Learning Extension

Awesome Spark / Packages / Middleware

Awesome Spark / Packages / Monitoring

Awesome Spark / Packages / Utilities

Awesome Spark / Packages / Natural Language Processing

Awesome Spark / Packages / Streaming

Awesome Spark / Packages / Interfaces

Awesome Spark / Packages / Data quality

Awesome Spark / Packages / Testing

Awesome Spark / Packages / Web Archives

Awesome Spark / Packages / Workflow Management

Awesome Spark / Resources / Books

Awesome Spark / Resources / Papers

Awesome Spark / Resources / MOOCS

Awesome Spark / Resources / Workshops

Awesome Spark / Resources / Projects Using Spark

Awesome Spark / Resources / Docker Images

Awesome Spark / Resources / Miscellaneous

Backlinks from these awesome lists:

More related projects: