webarchive

Web archive parser

Provides tools for reading and parsing web archive formats used in digital preservation.

golang readers for ARC and WARC webarchive formats

GitHub

20 stars
7 watching
2 forks
Language: Go
last commit: over 1 year ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
ukwa/webarchive-discovery Tools for indexing and discovering archived web content 116
webrecorder/archiveweb.page A high-fidelity web archiving system for storing and replaying interactive web pages in browsers. 862
derfenix/webarchive A web-based archive service that allows users to store and manage web pages in various formats. 112
helgeho/warcpartitioner Tool for partitioning and merging Web archive files by MIME type and year 1
peterk/warcworker A web archiving tool that archives websites with high-fidelity preservation capabilities. 55
go-shiori/obelisk Archives a web page as a single HTML file with embedded resources. 263
turicas/crau A command-line tool for archiving and playing back websites in WARC format 57
n0tan3rd/node-warc A tool for parsing and generating Web Archive files in JavaScript using Node.js 94
wabarc/rivet A tool for archiving webpages to IPFS 12
internetarchive/warctools Tools for working with archived web content 152
ikreymer/webarchive-indexing Tools for bulk indexing of WARC/ARC files to create a shared url index 42
wabarc/wayback A tool for capturing and preserving web content and making it accessible in the future. 1,818
machawk1/wail A graphical user interface layer for preserving and replaying web pages using multiple archiving tools. 350
jarofghosts/memento-client Provides a simple JavaScript interface to access historical web pages via the Wayback Machine 14
webrecorder/har2warc Converts HTTP Archive format to Web Archive format 46