httrack2warc

WARC crawler

Converts HTTrack crawls to WARC files by reconstructing requests and responses from logs

Converts HTTrack crawls to WARC files

GitHub

30 stars
20 watching
6 forks
Language: Java
last commit: 4 months ago
Linked from 1 awesome list

web-archiving

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
webrecorder/har2warc Converts HTTP Archive format to Web Archive format 46
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,398
iipc/warc2html Converts WARC files to static HTML with relative link rewriting and renaming 39
internetarchive/warctools Tools for working with archived web content 152
iipc/jwarc A Java library for reading and writing WARC files with a typed API 47
n0tan3rd/node-warc A tool for parsing and generating Web Archive files in JavaScript using Node.js 94
helgeho/web2warc A Web crawler that creates custom archives in WARC/CDX format 24
webrecorder/warcio A fast streaming library for working with WARC format web archival data 385
chfoo/warcat Tool for handling Web Archive files 150
steffenfritz/html2warc Converts offline data into a standard archival format 18
helgeho/warcpartitioner Tool for partitioning and merging Web archive files by MIME type and year 1
ukwa/webarchive-discovery Tools for indexing and discovering archived web content 116
turicas/crau A command-line tool for archiving and playing back websites in WARC format 57
nlnwa/gowarcserver A tool for indexing and serving contents of WARC files. 14
ikreymer/webarchive-indexing Tools for bulk indexing of WARC/ARC files to create a shared url index 42