arch
Archive processor
A distributed compute analysis system for web archive collections
Web application for distributed compute analysis of Archive-It web archive collections.
15 stars
21 watching
4 forks
Language: Scala
last commit: 6 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A data processing library built on top of Apache Spark to handle temporal web data | 11 |
| Provides tools for reading and parsing web archive formats used in digital preservation. | 20 |
| A web-based application for managing and providing access to archives and cultural heritage collections | 355 |
| A framework for efficient data processing and extraction from archival collections, enabling the transformation of raw data into more accessible formats. | 145 |
| Tools for indexing and discovering archived web content | 117 |
| Automates archiving of online content from various sources into local storage or cloud services | 585 |
| A RocksDB-based server for managing and replicating capture indexes used in web archiving | 33 |
| Downloads and crawls web pages, allowing for the archiving of websites. | 556 |
| A command-line and Python interface to access Archive.org's services | 1,643 |
| A Scala library providing an API to create, update and extract archives of various formats using the 7-Zip compression engine. | 44 |
| A toolkit for analyzing and extracting data from legacy web archives in a structured format suitable for further analysis or reuse | 3 |
| A collection of services and projects to enable vintage computers to access the internet using compatible browsers or proxies. | 258 |
| A library that provides a unified interface for managing archives of various formats, supporting multiple compression algorithms and file system operations. | 274 |
| A containerized web archive and search system using Elastic Search | 27 |
| A web-based archive service that allows users to store and manage web pages in various formats. | 115 |