wasp

Web archiver

A containerized web archive and search system using Elastic Search

GitHub

26 stars
13 watching
4 forks
Language: Java
last commit: about 2 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
sul-dlss/wasapi-downloader An application to download archives of web archiving projects 6
ukwa/webarchive-discovery Tools for indexing and discovering archived web content 116
webrecorder/archiveweb.page A high-fidelity web archiving system for storing and replaying interactive web pages in browsers. 862
derfenix/webarchive A web-based archive service that allows users to store and manage web pages in various formats. 112
webrecorder/pywb A toolkit for archiving and replaying web content accurately and efficiently 1,407
vida-nyu/ache A web crawler designed to efficiently collect and prioritize relevant content from the web 454
oduwsdl/ipwb A system for dispersing and replaying archived web content using peer-to-peer technology. 617
jarofghosts/memento-client Provides a simple JavaScript interface to access historical web pages via the Wayback Machine 14
internetarchive/arch A distributed compute analysis system for web archive collections 15
florents-tselai/warcdb A library for storing and querying web crawl data in a compact, easily sharable format. 394
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,402
ikreymer/webarchive-indexing Tools for bulk indexing of WARC/ARC files to create a shared url index 42
elastic/elasticsearch A distributed search and analytics engine for scalable data storage and real-time search capabilities 1,332
netarchivesuite/jwat A toolkit for analyzing and extracting data from legacy web archives in a structured format suitable for further analysis or reuse 3
stevepolitodesign/my_site_archive A simple Rails application for archiving websites 27