spark

Data processor

An analytics engine designed to handle large-scale data processing and analysis

Apache Spark - A unified analytics engine for large-scale data processing

GitHub

40k stars

2k watching

28k forks

Language: Scala

last commit: over 1 year ago

Linked from 9 awesome lists

big-datajavajdbcpythonrscalasparksql

spark.apache.org/

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
datastax/spark-cassandra-connector	A library that enables integration between Apache Spark and Apache Cassandra for fast data processing and analysis.	1,944
internetarchive/sparkling	A data processing library built on top of Apache Spark to handle temporal web data	11
dotnet/spark	Provides high-performance APIs for using Apache Spark with .NET	2,032
svenkreiss/pysparkling	A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets	262
spiritlab/spark	A research-focused implementation of Apache Spark with homomorphic encryption support	3
databricks/spark-xml	A library that parses and queries XML data in Apache Spark	504
instaclustr/sample-kafkasparkcassandra	An introductory Scala app using Apache Spark Streaming to process data from Kafka and write summaries to Cassandra.	23
tweag/sparkle	A tool for creating resilient, scalable analytics applications with Haskell on top of Apache Spark	447
apache/samza	A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees	817
kotlin/kotlin-spark-api	Provides compatibility and extensions between Kotlin and Apache Spark for big data processing	463
databricks/spark-csv	A library for parsing and querying CSV data with Apache Spark	1,052
irvingc/dbscan-on-spark	An implementation of the DBSCAN clustering algorithm on top of Apache Spark	184
microsoft/mobius	Provides a C# API for interacting with Apache Spark	941
helgeho/archivespark	A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats.	145
sparklyr/sparklyr	An R interface to Apache Spark for distributed data analysis and machine learning	955