Sparkling

Web data processor

A data processing library built on top of Apache Spark to handle temporal web data

Internet Archive's Sparkling Data Processing Library

11 stars

20 watching

2 forks

Language: Scala

last commit: over 1 year ago

Linked from 1 awesome list

Backlinks from these awesome lists:

iipc/awesome-web-archiving

Related projects:

Repository	Description	Stars
apache/spark	An analytics engine designed to handle large-scale data processing and analysis	40,170
uscdatascience/sparkler	A high-performance web crawler built on Apache Spark that fetches and analyzes web resources in real-time.	411
internetarchive/arch	A distributed compute analysis system for web archive collections	15
helgeho/archivespark	A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats.	145
gorillalabs/sparkling	A Clojure API for interacting with Apache Spark	448
1000ch/webponize	A Sparkle update project for web application management and automation.	7
databricks/spark-csv	A library for parsing and querying CSV data with Apache Spark	1,052
svenkreiss/pysparkling	A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets	262
h2oai/sparkling-water	Integrates H2O's machine learning capabilities with Apache Spark for big data processing and analytics	968
sparklingpandas/sparklingpandas	Enables distributed data analysis using PySpark and Pandas APIs	362
instaclustr/sample-kafkasparkcassandra	An introductory Scala app using Apache Spark Streaming to process data from Kafka and write summaries to Cassandra.	23
tweag/sparkle	A tool for creating resilient, scalable analytics applications with Haskell on top of Apache Spark	447
juliasilge/tidytext	Provides tools and data to convert text into tidy data formats for natural language processing tasks	1,182
sparklyr/sparklyr	An R interface to Apache Spark for distributed data analysis and machine learning	955
weblyzard/streaming-sparql	Provides a robust, incremental processing of streaming results from SPARQL servers.	6