EMR_Spark_Automation
EMR automation tool
Automates deployment of an AWS EMR cluster and execution of Spark jobs
A repository for deploying an AWS EMR cluster and submiting spark jobs on it. Boostrapping by default does inclues pysparkling so one can easily use h2o with python and spark.
8 stars
1 watching
5 forks
Language: Python
last commit: over 7 years ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
nchammas/flintrock | A command-line tool for launching and managing Apache Spark clusters on AWS | 638 |
amplab-extras/sparkr-pkg | Provides a lightweight R interface to Apache Spark for data processing | 641 |
sparklyr/sparklyr | An R interface to Apache Spark for distributed data analysis and machine learning | 957 |
joblib/joblib-spark | Enables parallelization of machine learning tasks on a distributed Spark cluster using the joblib library. | 242 |
emcghee/payloadautomation | Automates payload development and deployment using Python classes to interact with Cobalt Strike and other tools | 117 |
rocher/ob-ada-spark | Supports Ada and SPARK programming languages in Emacs org-babel for compiling, running, and formal verification of code | 8 |
instaclustr/sample-sparkjobservercassandra | Demonstrates using Spark Jobserver to run Apache Spark analytics with Cassandra | 2 |
jupyter-incubator/sparkmagic | An open source library that enables interactive development of applications using remote Spark clusters | 1,328 |
mrpowers-io/spark-daria | A set of reusable tools to simplify Spark development in Scala | 754 |
tubular/sparkly | A set of Python libraries and tools to simplify interactions with various data sources using Apache Spark. | 60 |
ondra-m/ruby-spark | A Ruby wrapper around Apache Spark's functionality for large-scale data processing | 227 |
mrpowers-io/spark-fast-tests | A testing helper library for Apache Spark applications. | 436 |
flint-bot/sparky | Provides a NodeJS API to interact with the Cisco Spark platform | 16 |
svenkreiss/pysparkling | A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets | 262 |
jahstreetorg/spark-on-kubernetes-helm | A Helm chart repository providing infrastructure templates for setting up a fully functional Spark on Kubernetes cluster with integrated services. | 199 |