warcworker

Web archiver

A web archiving tool that archives websites with high-fidelity preservation capabilities.

A dockerized, queued high fidelity web archiver based on Squidwarc

GitHub

57 stars

6 watching

9 forks

Language: Python

last commit: about 2 years ago

Linked from 1 awesome list

archivinghigh-fidelity-preservationpreservationwebarchiveswebarchiving

Backlinks from these awesome lists:

iipc/awesome-web-archiving

Related projects:

Repository	Description	Stars
webrecorder/pywb	A toolkit for archiving and replaying web content accurately and efficiently	1,418
n0tan3rd/squidwarc	An archival crawler built on top of Chrome or Chromium to preserve the web in high fidelity and user scriptable manner	170
webrecorder/archiveweb.page	A high-fidelity web archiving system for storing and replaying interactive web pages in browsers.	903
internetarchive/warcprox	An HTTP proxy designed to capture and archive web traffic, including encrypted HTTPS connections.	389
turicas/crau	A command-line tool for archiving and playing back websites in WARC format	59
machawk1/wail	A graphical user interface layer for preserving and replaying web pages using multiple archiving tools.	353
wabarc/wayback	A tool for capturing and preserving web content and making it accessible in the future.	1,839
archiveteam/grab-site	A web crawler designed to backup websites by recursively crawling and writing WARC files.	1,406
wabarc/cairn	A tool for archiving web pages as single HTML files	45
oduwsdl/archivenow	A tool to automate archiving of web resources into public archives.	409
webrecorder/har2warc	Converts HTTP Archive format to Web Archive format	48
oduwsdl/ipwb	A system for dispersing and replaying archived web content using peer-to-peer technology.	617
internetarchive/warctools	Tools for working with archived web content	153
richardlehane/webarchive	Provides tools for reading and parsing web archive formats used in digital preservation.	20
webrecorder/warcio	A fast streaming library for working with WARC format web archival data	391