warcprox

Web archiver

An HTTP proxy designed to capture and archive web traffic, including encrypted HTTPS connections.

WARC writing MITM HTTP/S proxy

GitHub

389 stars
39 watching
55 forks
Language: Python
last commit: 4 days ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
internetarchive/warctools Tools for working with archived web content 153
peterk/warcworker A web archiving tool that archives websites with high-fidelity preservation capabilities. 57
webrecorder/har2warc Converts HTTP Archive format to Web Archive format 48
turicas/crau A command-line tool for archiving and playing back websites in WARC format 59
ikreymer/webarchive-indexing Tools for bulk indexing of WARC/ARC files to create a shared url index 43
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,406
webrecorder/warcio A fast streaming library for working with WARC format web archival data 391
chfoo/warcat Tool for handling Web Archive files 152
webrecorder/archiveweb.page A high-fidelity web archiving system for storing and replaying interactive web pages in browsers. 903
florents-tselai/warcdb A library for storing and querying web crawl data in a compact, easily sharable format. 397
richardlehane/webarchive Provides tools for reading and parsing web archive formats used in digital preservation. 20
n0tan3rd/squidwarc An archival crawler built on top of Chrome or Chromium to preserve the web in high fidelity and user scriptable manner 170
wabarc/wayback A tool for capturing and preserving web content and making it accessible in the future. 1,839
ukwa/webarchive-discovery Tools for indexing and discovering archived web content 117
helgeho/warcpartitioner Tool for partitioning and merging Web archive files by MIME type and year 1