spark
Data processor
An analytics engine designed to handle large-scale data processing and analysis
Apache Spark - A unified analytics engine for large-scale data processing
40k stars
2k watching
28k forks
Language: Scala
last commit: 6 days ago
Linked from 9 awesome lists
big-datajavajdbcpythonrscalasparksql
Related projects:
Repository | Description | Stars |
---|---|---|
datastax/spark-cassandra-connector | A library that enables integration between Apache Spark and Apache Cassandra for fast data processing and analysis. | 1,943 |
internetarchive/sparkling | A data processing library built on top of Apache Spark to handle temporal web data | 11 |
dotnet/spark | Provides high-performance APIs for using Apache Spark with .NET | 2,023 |
svenkreiss/pysparkling | A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets | 262 |
spiritlab/spark | A research-focused implementation of Apache Spark with homomorphic encryption support | 3 |
databricks/spark-xml | A library that parses and queries XML data in Apache Spark | 505 |
instaclustr/sample-kafkasparkcassandra | An introductory Scala app using Apache Spark Streaming to process data from Kafka and write summaries to Cassandra. | 23 |
tweag/sparkle | A tool for creating resilient, scalable analytics applications with Haskell on top of Apache Spark | 447 |
apache/samza | A distributed stream processing framework for handling high-volume data streams with fault tolerance and durability guarantees | 820 |
kotlin/kotlin-spark-api | Provides compatibility and extensions between Kotlin and Apache Spark for big data processing | 461 |
databricks/spark-csv | A library for parsing and querying CSV data with Apache Spark | 1,053 |
irvingc/dbscan-on-spark | An implementation of the DBSCAN clustering algorithm on top of Apache Spark | 184 |
microsoft/mobius | Provides a C# API for interacting with Apache Spark | 942 |
helgeho/archivespark | A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats. | 145 |
sparklyr/sparklyr | An R interface to Apache Spark for distributed data analysis and machine learning | 957 |