html2warc
Data converter
Converts offline data into a standard archival format
simple script to convert web resources to a single warc file
18 stars
4 watching
2 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
webrecorder/har2warc | Converts HTTP Archive format to Web Archive format | 48 |
iipc/warc2html | Converts WARC files to static HTML with relative link rewriting and renaming | 41 |
nla/httrack2warc | Converts HTTrack crawls to WARC files by reconstructing requests and responses from logs | 32 |
webrecorder/warcio | A fast streaming library for working with WARC format web archival data | 391 |
internetarchive/warctools | Tools for working with archived web content | 153 |
chfoo/warcat | Tool for handling Web Archive files | 152 |
alir3z4/html2text | Converts HTML to plain text that can be easily read and formatted as Markdown. | 1,862 |
deedy5/html2text_rs | Converts HTML to different formats | 4 |
arcalex/warcrefs | Tools to identify and convert duplicate records in archived web content | 6 |
n0tan3rd/node-warc | A tool for parsing and generating Web Archive files in JavaScript using Node.js | 95 |
samboy/woff | Converts TrueType font files to compressed Webfont formats for web use | 25 |
turicas/crau | A command-line tool for archiving and playing back websites in WARC format | 59 |
florents-tselai/warcdb | A library for storing and querying web crawl data in a compact, easily sharable format. | 397 |
richardlehane/webarchive | Provides tools for reading and parsing web archive formats used in digital preservation. | 20 |
iipc/jwarc | A Java library for reading and writing WARC files with a typed API | 48 |