Sparkling
Web data processor
A data processing library built on top of Apache Spark to handle temporal web data
Internet Archive's Sparkling Data Processing Library
11 stars
20 watching
2 forks
Language: Scala
last commit: 3 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| An analytics engine designed to handle large-scale data processing and analysis | 40,170 |
| A high-performance web crawler built on Apache Spark that fetches and analyzes web resources in real-time. | 411 |
| A distributed compute analysis system for web archive collections | 15 |
| A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats. | 145 |
| A Clojure API for interacting with Apache Spark | 448 |
| A Sparkle update project for web application management and automation. | 7 |
| A library for parsing and querying CSV data with Apache Spark | 1,052 |
| A lightweight Python implementation of Spark's RDD and DStream interfaces for improved performance on small datasets | 262 |
| Integrates H2O's machine learning capabilities with Apache Spark for big data processing and analytics | 968 |
| Enables distributed data analysis using PySpark and Pandas APIs | 362 |
| An introductory Scala app using Apache Spark Streaming to process data from Kafka and write summaries to Cassandra. | 23 |
| A tool for creating resilient, scalable analytics applications with Haskell on top of Apache Spark | 447 |
| Provides tools and data to convert text into tidy data formats for natural language processing tasks | 1,182 |
| An R interface to Apache Spark for distributed data analysis and machine learning | 955 |
| Provides a robust, incremental processing of streaming results from SPARQL servers. | 6 |