EMR_Spark_Automation

EMR automation tool

Automates deployment of an AWS EMR cluster and execution of Spark jobs

A repository for deploying an AWS EMR cluster and submiting spark jobs on it. Boostrapping by default does inclues pysparkling so one can easily use h2o with python and spark.

GitHub

8 stars

1 watching

5 forks

Language: Python

last commit: about 9 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

h2oai/awesome-h2o

Related projects:

Repository	Description	Stars
nchammas/flintrock	A command-line tool for launching and managing Apache Spark clusters on AWS	637
amplab-extras/sparkr-pkg	Provides a lightweight R interface to Apache Spark for data processing	641
sparklyr/sparklyr	An R interface to Apache Spark for distributed data analysis and machine learning	955
joblib/joblib-spark	Enables parallelization of machine learning tasks on a distributed Spark cluster using the joblib library.	243
emcghee/payloadautomation	Automates payload development and deployment using Python classes to interact with Cobalt Strike and other tools	118
rocher/ob-ada-spark	Supports Ada and SPARK programming languages in Emacs org-babel for compiling, running, and formal verification of code	8
instaclustr/sample-sparkjobservercassandra	Demonstrates using Spark Jobserver to run Apache Spark analytics with Cassandra	2
jupyter-incubator/sparkmagic	An open source library that enables interactive development of applications using remote Spark clusters	1,334
mrpowers-io/spark-daria	A set of reusable tools to simplify Spark development in Scala	754
tubular/sparkly	A set of Python libraries and tools to simplify interactions with various data sources using Apache Spark.	61
ondra-m/ruby-spark	A Ruby wrapper around Apache Spark's functionality for large-scale data processing	227
mrpowers-io/spark-fast-tests	A testing helper library for Apache Spark applications.	437
flint-bot/sparky	Provides a NodeJS API to interact with the Cisco Spark platform	16
svenkreiss/pysparkling	A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets	262
jahstreetorg/spark-on-kubernetes-helm	A Helm chart repository providing infrastructure templates for setting up a fully functional Spark on Kubernetes cluster with integrated services.	200