httrack2warc
WARC crawler
Converts HTTrack crawls to WARC files by reconstructing requests and responses from logs
Converts HTTrack crawls to WARC files
30 stars
20 watching
6 forks
Language: Java
last commit: 4 months ago
Linked from 1 awesome list
web-archiving
Related projects:
Repository | Description | Stars |
---|---|---|
webrecorder/har2warc | Converts HTTP Archive format to Web Archive format | 46 |
archiveteam/grab-site | A web crawler designed to backup websites by recursively crawling and writing WARC files. | 1,398 |
iipc/warc2html | Converts WARC files to static HTML with relative link rewriting and renaming | 39 |
internetarchive/warctools | Tools for working with archived web content | 152 |
iipc/jwarc | A Java library for reading and writing WARC files with a typed API | 47 |
n0tan3rd/node-warc | A tool for parsing and generating Web Archive files in JavaScript using Node.js | 94 |
helgeho/web2warc | A Web crawler that creates custom archives in WARC/CDX format | 24 |
webrecorder/warcio | A fast streaming library for working with WARC format web archival data | 385 |
chfoo/warcat | Tool for handling Web Archive files | 150 |
steffenfritz/html2warc | Converts offline data into a standard archival format | 18 |
helgeho/warcpartitioner | Tool for partitioning and merging Web archive files by MIME type and year | 1 |
ukwa/webarchive-discovery | Tools for indexing and discovering archived web content | 116 |
turicas/crau | A command-line tool for archiving and playing back websites in WARC format | 57 |
nlnwa/gowarcserver | A tool for indexing and serving contents of WARC files. | 14 |
ikreymer/webarchive-indexing | Tools for bulk indexing of WARC/ARC files to create a shared url index | 42 |