warcio

WARC library

A fast streaming library for working with WARC format web archival data

Streaming WARC/ARC library for fast web archive IO

GitHub

385 stars
22 watching
58 forks
Language: Python
last commit: 9 days ago
Linked from 1 awesome list

pythonpywbwarcweb-archivesweb-archiving

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
webrecorder/har2warc Converts HTTP Archive format to Web Archive format 46
chfoo/warcat Tool for handling Web Archive files 150
internetarchive/warctools Tools for working with archived web content 152
turicas/crau A command-line tool for archiving and playing back websites in WARC format 57
ikreymer/webarchive-indexing Tools for bulk indexing of WARC/ARC files to create a shared url index 42
webrecorder/pywb A toolkit for archiving and replaying web content accurately and efficiently 1,407
webrecorder/archiveweb.page A high-fidelity web archiving system for storing and replaying interactive web pages in browsers. 862
helgeho/warcpartitioner Tool for partitioning and merging Web archive files by MIME type and year 1
peterk/warcworker A web archiving tool that archives websites with high-fidelity preservation capabilities. 55
n0tan3rd/node-warc A tool for parsing and generating Web Archive files in JavaScript using Node.js 94
internetarchive/warcprox An HTTP proxy designed to capture and archive web traffic, including encrypted HTTPS connections. 381
unt-libraries/py-wasapi-client Downloads WARC files from a WASAPI access point. 14
nla/httrack2warc Converts HTTrack crawls to WARC files by reconstructing requests and responses from logs 30
steffenfritz/html2warc Converts offline data into a standard archival format 18
richardlehane/webarchive Provides tools for reading and parsing web archive formats used in digital preservation. 20