arch
Archive processor
A distributed compute analysis system for web archive collections
Web application for distributed compute analysis of Archive-It web archive collections.
15 stars
21 watching
4 forks
Language: Scala
last commit: 3 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
internetarchive/sparkling | A data processing library built on top of Apache Spark to handle temporal web data | 11 |
richardlehane/webarchive | Provides tools for reading and parsing web archive formats used in digital preservation. | 20 |
archivesspace/archivesspace | A web-based application for managing and providing access to archives and cultural heritage collections | 353 |
helgeho/archivespark | A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats. | 145 |
ukwa/webarchive-discovery | Tools for indexing and discovering archived web content | 116 |
bellingcat/auto-archiver | Automates archiving of online content from various sources into local storage or cloud services | 570 |
nla/outbackcdx | A RocksDB-based server for managing and replicating capture indexes used in web archiving | 32 |
archiveteam/wpull | Downloads and crawls web pages, allowing for the archiving of websites. | 556 |
jjjake/internetarchive | A command-line and Python interface to access Archive.org's services | 1,625 |
gonearewe/sevenz4s | A Scala library providing an API to create, update and extract archives of various formats using the 7-Zip compression engine. | 44 |
netarchivesuite/jwat | A toolkit for analyzing and extracting data from legacy web archives in a structured format suitable for further analysis or reuse | 3 |
ssshake/retro-computing-internet-resources | A collection of services and projects to enable vintage computers to access the internet using compatible browsers or proxies. | 255 |
wapmorgan/unifiedarchive | A library that provides a unified interface for managing archives of various formats, supporting multiple compression algorithms and file system operations. | 275 |
webis-de/wasp | A containerized web archive and search system using Elastic Search | 26 |
derfenix/webarchive | A web-based archive service that allows users to store and manage web pages in various formats. | 112 |