ArchiveSpark

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

GitHub

143 stars
14 watching
19 forks
Language: Scala
last commit: 9 days ago
Linked from 1 awesome list

archivesparkinternet-archivesparkspark-frameworkwarcweb-archivingwebarchive

Backlinks from these awesome lists: