chatnoir-resiliparse
Web archiver
A toolkit for processing and analyzing web archive data
A robust web archive analytics toolkit
89 stars
9 watching
14 forks
Language: Cython
last commit: 12 days ago
Linked from 1 awesome list
bigdatacppcythonextractionhtmlparserpythonwarcwebwebarchive
Related projects:
Repository | Description | Stars |
---|---|---|
wabarc/cairn | A tool for archiving web pages as single HTML files | 45 |
webrecorder/archiveweb.page | A high-fidelity web archiving system for storing and replaying interactive web pages in browsers. | 903 |
bellingcat/auto-archiver | Automates archiving of online content from various sources into local storage or cloud services | 585 |
recrm/archivetools | A collection of tools for extracting and analyzing data from web archives | 71 |
webrecorder/pywb | A toolkit for archiving and replaying web content accurately and efficiently | 1,418 |
peterk/warcworker | A web archiving tool that archives websites with high-fidelity preservation capabilities. | 57 |
richardlehane/webarchive | Provides tools for reading and parsing web archive formats used in digital preservation. | 20 |
turicas/crau | A command-line tool for archiving and playing back websites in WARC format | 59 |
archiveteam/grab-site | A web crawler designed to backup websites by recursively crawling and writing WARC files. | 1,406 |
machawk1/wail | A graphical user interface layer for preserving and replaying web pages using multiple archiving tools. | 353 |
jarofghosts/memento-client | Provides a simple JavaScript interface to access historical web pages via the Wayback Machine | 14 |
archiveteam/wpull | Downloads and crawls web pages, allowing for the archiving of websites. | 556 |
n0tan3rd/squidwarc | An archival crawler built on top of Chrome or Chromium to preserve the web in high fidelity and user scriptable manner | 170 |
wabarc/wayback | A tool for capturing and preserving web content and making it accessible in the future. | 1,839 |
chfoo/warcat | Tool for handling Web Archive files | 152 |