spark

Data processor

An analytics engine designed to handle large-scale data processing and analysis

Apache Spark - A unified analytics engine for large-scale data processing

GitHub

40k stars
2k watching
28k forks
Language: Scala
last commit: 6 days ago
Linked from 9 awesome lists

big-datajavajdbcpythonrscalasparksql

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
datastax/spark-cassandra-connector A library that enables integration between Apache Spark and Apache Cassandra for fast data processing and analysis. 1,943
internetarchive/sparkling A data processing library built on top of Apache Spark to handle temporal web data 11
dotnet/spark Provides high-performance APIs for using Apache Spark with .NET 2,023
svenkreiss/pysparkling A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets 262
spiritlab/spark A research-focused implementation of Apache Spark with homomorphic encryption support 3
databricks/spark-xml A library that parses and queries XML data in Apache Spark 505
instaclustr/sample-kafkasparkcassandra An introductory Scala app using Apache Spark Streaming to process data from Kafka and write summaries to Cassandra. 23
tweag/sparkle A tool for creating resilient, scalable analytics applications with Haskell on top of Apache Spark 447
apache/samza A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees 820
kotlin/kotlin-spark-api Provides compatibility and extensions between Kotlin and Apache Spark for big data processing 461
databricks/spark-csv A library for parsing and querying CSV data with Apache Spark 1,053
irvingc/dbscan-on-spark An implementation of the DBSCAN clustering algorithm on top of Apache Spark 184
microsoft/mobius Provides a C# API for interacting with Apache Spark 942
helgeho/archivespark A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats. 145
sparklyr/sparklyr An R interface to Apache Spark for distributed data analysis and machine learning 957