EMR_Spark_Automation

EMR automation tool

Automates deployment of an AWS EMR cluster and execution of Spark jobs

A repository for deploying an AWS EMR cluster and submiting spark jobs on it. Boostrapping by default does inclues pysparkling so one can easily use h2o with python and spark.

GitHub

8 stars
1 watching
5 forks
Language: Python
last commit: over 7 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
nchammas/flintrock A command-line tool for launching and managing Apache Spark clusters on AWS 638
amplab-extras/sparkr-pkg Provides a lightweight R interface to Apache Spark for data processing 641
sparklyr/sparklyr An R interface to Apache Spark for distributed data analysis and machine learning 957
joblib/joblib-spark Enables parallelization of machine learning tasks on a distributed Spark cluster using the joblib library. 242
emcghee/payloadautomation Automates payload development and deployment using Python classes to interact with Cobalt Strike and other tools 117
rocher/ob-ada-spark Supports Ada and SPARK programming languages in Emacs org-babel for compiling, running, and formal verification of code 8
instaclustr/sample-sparkjobservercassandra Demonstrates using Spark Jobserver to run Apache Spark analytics with Cassandra 2
jupyter-incubator/sparkmagic An open source library that enables interactive development of applications using remote Spark clusters 1,328
mrpowers-io/spark-daria A set of reusable tools to simplify Spark development in Scala 754
tubular/sparkly A set of Python libraries and tools to simplify interactions with various data sources using Apache Spark. 60
ondra-m/ruby-spark A Ruby wrapper around Apache Spark's functionality for large-scale data processing 227
mrpowers-io/spark-fast-tests A testing helper library for Apache Spark applications. 436
flint-bot/sparky Provides a NodeJS API to interact with the Cisco Spark platform 16
svenkreiss/pysparkling A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets 262
jahstreetorg/spark-on-kubernetes-helm A Helm chart repository providing infrastructure templates for setting up a fully functional Spark on Kubernetes cluster with integrated services. 199