spark-clustering

Clustering algorithms library

Implementations of clustering algorithms using Spark in Scala

Some Spark implementations of clustering algorithms.

GitHub

18 stars
6 watching
8 forks
Language: Scala
last commit: about 6 years ago

Related projects:

Repository Description Stars
spark-clustering-notebook/g-stream An implementation of data stream clustering algorithms using Spark Streaming. 3
irvingc/dbscan-on-spark An implementation of the DBSCAN clustering algorithm on top of Apache Spark 184
databricks/spark-xml A library that parses and queries XML data in Apache Spark 505
databricks/tensorframes Enables manipulation of Apache Spark DataFrames using TensorFlow programs 749
kotlin/kotlin-spark-api Provides compatibility and extensions between Kotlin and Apache Spark for big data processing 461
e-xpertsolutions/go-cluster Implementation of k-modes and k-prototypes clustering algorithms in Go. 43
dutrevis/spark-resources-metrics-plugin A Spark plugin that registers metrics from operational system resources 0
joblib/joblib-spark Enables parallelization of machine learning tasks on a distributed Spark cluster using the joblib library. 242
emilbayes/clustering.js Provides implementations of clustering algorithms in JavaScript 30
sw1sh/frege-spark An effort to integrate Apache Spark with the Frege programming language 5
xuyxu/clustering This repository provides implementations of various clustering and subspace clustering algorithms in MATLAB, including K-means, ISODATA, Mean Shift, DBSCAN, Gaussian Mixture Model, LVQ, Subspace Clustering Algorithms like Subspace K-means and Entropy-Weighting Subspace K-means. 224
iralabdisco/pso-clustering An algorithm for unsupervised machine learning tasks involving grouping similar data points into clusters. 68
youweiliang/multi-view_clustering Provides implementations of various multi-view spectral clustering algorithms for data analysis and visualization. 85
databricks/spark-corenlp Wraps Stanford CoreNLP annotators as Spark DataFrame functions for natural language processing tasks 422
twosigma/flint A highly optimized time series library for Apache Spark 1,003