ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
143 stars
14 watching
19 forks
Language: Scala
last commit: 9 days ago
Linked from 1 awesome list
archivesparkinternet-archivesparkspark-frameworkwarcweb-archivingwebarchive