warcio
WARC library
A fast streaming library for working with WARC format web archival data
Streaming WARC/ARC library for fast web archive IO
391 stars
22 watching
58 forks
Language: Python
last commit: 2 months ago
Linked from 1 awesome list
pythonpywbwarcweb-archivesweb-archiving
Related projects:
Repository | Description | Stars |
---|---|---|
| Converts HTTP Archive format to Web Archive format | 48 |
| Tool for handling Web Archive files | 152 |
| Tools for working with archived web content | 153 |
| A command-line tool for archiving and playing back websites in WARC format | 59 |
| Tools for bulk indexing of WARC/ARC files to create a shared url index | 43 |
| A toolkit for archiving and replaying web content accurately and efficiently | 1,418 |
| A high-fidelity web archiving system for storing and replaying interactive web pages in browsers. | 903 |
| Tool for partitioning and merging Web archive files by MIME type and year | 1 |
| A web archiving tool that archives websites with high-fidelity preservation capabilities. | 57 |
| A tool for parsing and generating Web Archive files in JavaScript using Node.js | 95 |
| An HTTP proxy designed to capture and archive web traffic, including encrypted HTTPS connections. | 389 |
| Downloads WARC files from a WASAPI access point. | 15 |
| Converts HTTrack crawls to WARC files by reconstructing requests and responses from logs | 32 |
| Converts offline data into a standard archival format | 18 |
| Provides tools for reading and parsing web archive formats used in digital preservation. | 20 |