Sparkling

Web data processor

A data processing library built on top of Apache Spark to handle temporal web data

Internet Archive's Sparkling Data Processing Library

GitHub

11 stars
20 watching
2 forks
Language: Scala
last commit: about 1 month ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apache/spark An analytics engine designed to handle large-scale data processing and analysis 39,916
uscdatascience/sparkler A high-performance web crawler built on Apache Spark that fetches and analyzes web resources in real-time. 410
internetarchive/arch A distributed compute analysis system for web archive collections 15
helgeho/archivespark A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats. 145
gorillalabs/sparkling A Clojure API for interacting with Apache Spark 448
1000ch/webponize A Sparkle update project for web application management and automation. 7
databricks/spark-csv A library for parsing and querying CSV data with Apache Spark 1,053
svenkreiss/pysparkling A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets 262
h2oai/sparkling-water Integrates H2O's machine learning capabilities with Apache Spark for big data processing and analytics 968
sparklingpandas/sparklingpandas Enables distributed data analysis using PySpark and Pandas APIs 361
instaclustr/sample-kafkasparkcassandra An introductory Scala app using Apache Spark Streaming to process data from Kafka and write summaries to Cassandra. 23
tweag/sparkle A tool for creating resilient, scalable analytics applications with Haskell on top of Apache Spark 447
juliasilge/tidytext Provides tools and data to convert text into tidy data formats for natural language processing tasks 1,180
sparklyr/sparklyr An R interface to Apache Spark for distributed data analysis and machine learning 957
weblyzard/streaming-sparql Provides a robust, incremental processing of streaming results from SPARQL servers. 6