warcworker

Web archiver

A web archiving tool that archives websites with high-fidelity preservation capabilities.

A dockerized, queued high fidelity web archiver based on Squidwarc

GitHub

57 stars
6 watching
9 forks
Language: Python
last commit: 5 months ago
Linked from 1 awesome list

archivinghigh-fidelity-preservationpreservationwebarchiveswebarchiving

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
webrecorder/pywb A toolkit for archiving and replaying web content accurately and efficiently 1,418
n0tan3rd/squidwarc An archival crawler built on top of Chrome or Chromium to preserve the web in high fidelity and user scriptable manner 170
webrecorder/archiveweb.page A high-fidelity web archiving system for storing and replaying interactive web pages in browsers. 903
internetarchive/warcprox An HTTP proxy designed to capture and archive web traffic, including encrypted HTTPS connections. 389
turicas/crau A command-line tool for archiving and playing back websites in WARC format 59
machawk1/wail A graphical user interface layer for preserving and replaying web pages using multiple archiving tools. 353
wabarc/wayback A tool for capturing and preserving web content and making it accessible in the future. 1,839
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,406
wabarc/cairn A tool for archiving web pages as single HTML files 45
oduwsdl/archivenow A tool to automate archiving of web resources into public archives. 409
webrecorder/har2warc Converts HTTP Archive format to Web Archive format 48
oduwsdl/ipwb A system for dispersing and replaying archived web content using peer-to-peer technology. 617
internetarchive/warctools Tools for working with archived web content 153
richardlehane/webarchive Provides tools for reading and parsing web archive formats used in digital preservation. 20
webrecorder/warcio A fast streaming library for working with WARC format web archival data 391