joblib-spark
Task parallelizer
Enables parallelization of machine learning tasks on a distributed Spark cluster using the joblib library.
Joblib Apache Spark Backend
243 stars
9 watching
26 forks
Language: Python
last commit: about 1 year ago
Linked from 1 awesome list
Related projects:
| Repository | Description | Stars |
|---|---|---|
| | Simplifies creating multiple worker threads to execute tasks in parallel | 72 |
| | A PyTorch implementation on Apache Spark for distributed deep learning model training and inference. | 339 |
| | A library providing a simple and expressive way to write parallel programs with complex task dependencies. | 8 |
| | Automates deployment of an AWS EMR cluster and execution of Spark jobs | 8 |
| | Distributed neural network framework for Apache Spark | 604 |
| | An analytics engine designed to handle large-scale data processing and analysis | 40,170 |
| | Implementations of clustering algorithms using Spark in Scala | 18 |
| | A Python library that integrates PySpark and scikit-learn for distributed machine learning | 1,154 |
| | Demonstrates using Spark Jobserver to run Apache Spark analytics with Cassandra | 2 |
| | A library that brings useful functions from various modern database management systems to Apache Spark | 56 |
| | An Elixir module that executes multiple processes in parallel to speed up slow computations | 63 |
| | Provides a reusable set of Nextflow subworkflows and processes for creating transient Apache Spark clusters on any infrastructure. | 14 |
| | A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets | 262 |
| | Provides a C# API for interacting with Apache Spark | 941 |
| | Provides compatibility and extensions between Kotlin and Apache Spark for big data processing | 463 |