Sparkling
Web data processor
A data processing library built on top of Apache Spark to handle temporal web data
Internet Archive's Sparkling Data Processing Library
11 stars
20 watching
2 forks
Language: Scala
last commit: 21 days ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
apache/spark | An analytics engine designed to handle large-scale data processing and analysis | 40,170 |
uscdatascience/sparkler | A high-performance web crawler built on Apache Spark that fetches and analyzes web resources in real-time. | 411 |
internetarchive/arch | A distributed compute analysis system for web archive collections | 15 |
helgeho/archivespark | A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats. | 145 |
gorillalabs/sparkling | A Clojure API for interacting with Apache Spark | 448 |
1000ch/webponize | A Sparkle update project for web application management and automation. | 7 |
databricks/spark-csv | A library for parsing and querying CSV data with Apache Spark | 1,052 |
svenkreiss/pysparkling | A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets | 262 |
h2oai/sparkling-water | Integrates H2O's machine learning capabilities with Apache Spark for big data processing and analytics | 968 |
sparklingpandas/sparklingpandas | Enables distributed data analysis using PySpark and Pandas APIs | 362 |
instaclustr/sample-kafkasparkcassandra | An introductory Scala app using Apache Spark Streaming to process data from Kafka and write summaries to Cassandra. | 23 |
tweag/sparkle | A tool for creating resilient, scalable analytics applications with Haskell on top of Apache Spark | 447 |
juliasilge/tidytext | Provides tools and data to convert text into tidy data formats for natural language processing tasks | 1,182 |
sparklyr/sparklyr | An R interface to Apache Spark for distributed data analysis and machine learning | 955 |
weblyzard/streaming-sparql | Provides a robust, incremental processing of streaming results from SPARQL servers. | 6 |