arch

Archive processor

A distributed compute analysis system for web archive collections

Web application for distributed compute analysis of Archive-It web archive collections.

GitHub

15 stars
21 watching
4 forks
Language: Scala
last commit: 3 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
internetarchive/sparkling A data processing library built on top of Apache Spark to handle temporal web data 11
richardlehane/webarchive Provides tools for reading and parsing web archive formats used in digital preservation. 20
archivesspace/archivesspace A web-based application for managing and providing access to archives and cultural heritage collections 353
helgeho/archivespark A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats. 145
ukwa/webarchive-discovery Tools for indexing and discovering archived web content 116
bellingcat/auto-archiver Automates archiving of online content from various sources into local storage or cloud services 570
nla/outbackcdx A RocksDB-based server for managing and replicating capture indexes used in web archiving 32
archiveteam/wpull Downloads and crawls web pages, allowing for the archiving of websites. 556
jjjake/internetarchive A command-line and Python interface to access Archive.org's services 1,625
gonearewe/sevenz4s A Scala library providing an API to create, update and extract archives of various formats using the 7-Zip compression engine. 44
netarchivesuite/jwat A toolkit for analyzing and extracting data from legacy web archives in a structured format suitable for further analysis or reuse 3
ssshake/retro-computing-internet-resources A collection of services and projects to enable vintage computers to access the internet using compatible browsers or proxies. 255
wapmorgan/unifiedarchive A library that provides a unified interface for managing archives of various formats, supporting multiple compression algorithms and file system operations. 275
webis-de/wasp A containerized web archive and search system using Elastic Search 26
derfenix/webarchive A web-based archive service that allows users to store and manage web pages in various formats. 112