joblib-spark

Task parallelizer

Enables parallelization of machine learning tasks on a distributed Spark cluster using the joblib library.

Joblib Apache Spark Backend

GitHub

243 stars
9 watching
26 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
shomali11/parallelizer Simplifies creating multiple worker threads to execute tasks in parallel 72
dmmiller612/sparktorch A PyTorch implementation on Apache Spark for distributed deep learning model training and inference. 339
clin99/cpp-taskflow A library providing a simple and expressive way to write parallel programs with complex task dependencies. 6
kcrandall/emr_spark_automation Automates deployment of an AWS EMR cluster and execution of Spark jobs 8
amplab/sparknet Distributed neural network framework for Apache Spark 604
apache/spark An analytics engine designed to handle large-scale data processing and analysis 40,066
tugdualsarazin/spark-clustering Implementations of clustering algorithms using Spark in Scala 18
lensacom/sparkit-learn A Python library that integrates PySpark and scikit-learn for distributed machine learning 1,154
instaclustr/sample-sparkjobservercassandra Demonstrates using Spark Jobserver to run Apache Spark analytics with Cassandra 2
yaooqinn/itachi A library that brings useful functions from various modern database management systems to Apache Spark 56
stevenjl/parex An Elixir module that executes multiple processes in parallel to speed up slow computations 63
janeliascicomp/nextflow-spark Provides a reusable set of Nextflow subworkflows and processes for creating transient Apache Spark clusters on any infrastructure. 14
svenkreiss/pysparkling A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets 262
microsoft/mobius Provides a C# API for interacting with Apache Spark 941
kotlin/kotlin-spark-api Provides compatibility and extensions between Kotlin and Apache Spark for big data processing 464