node-warc

WARC parser

A tool for parsing and generating Web Archive files in JavaScript using Node.js

Parse And Create Web ARChive (WARC) files with node.js

GitHub

94 stars
9 watching
20 forks
Language: JavaScript
last commit: almost 2 years ago
Linked from 1 awesome list

chrome-remote-interfacepupeteerwarcwarc-filesweb-archivesweb-archivingwebarchivewebarchiving

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
internetarchive/warctools Tools for working with archived web content 152
nla/httrack2warc Converts HTTrack crawls to WARC files by reconstructing requests and responses from logs 30
webrecorder/warcio A fast streaming library for working with WARC format web archival data 385
helgeho/warcpartitioner Tool for partitioning and merging Web archive files by MIME type and year 1
richardlehane/webarchive Provides tools for reading and parsing web archive formats used in digital preservation. 20
peterk/warcworker A web archiving tool that archives websites with high-fidelity preservation capabilities. 55
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,398
n0tan3rd/node-cdxj A Node.js library for parsing CDXJ files produced by Pywb 0
webrecorder/har2warc Converts HTTP Archive format to Web Archive format 46
turicas/crau A command-line tool for archiving and playing back websites in WARC format 57
chfoo/warcat Tool for handling Web Archive files 150
n0tan3rd/squidwarc An archival crawler built on top of Chrome or Chromium to preserve the web in high fidelity and user scriptable manner 169
ikreymer/webarchive-indexing Tools for bulk indexing of WARC/ARC files to create a shared url index 42
wabarc/cairn A tool for archiving web pages as single HTML files 43
nlnwa/gowarcserver A tool for indexing and serving contents of WARC files. 14