webarchive-discovery
Web archive indexer
Tools for indexing and discovering archived web content
WARC and ARC indexing and discovery tools.
116 stars
24 watching
25 forks
Language: Java
last commit: 4 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
ukwa/shine | A web archive exploration UI built on top of the Solr search engine and warc-discovery indexer. | 43 |
richardlehane/webarchive | Provides tools for reading and parsing web archive formats used in digital preservation. | 20 |
ikreymer/webarchive-indexing | Tools for bulk indexing of WARC/ARC files to create a shared url index | 42 |
internetarchive/warctools | Tools for working with archived web content | 152 |
netarchivesuite/jwat | A toolkit for analyzing and extracting data from legacy web archives in a structured format suitable for further analysis or reuse | 3 |
netarchivesuite/solrwayback | A web-based search interface and Wayback machine for browsing archived web pages using an index of WARC files. | 102 |
webis-de/wasp | A containerized web archive and search system using Elastic Search | 26 |
helgeho/warcpartitioner | Tool for partitioning and merging Web archive files by MIME type and year | 1 |
peterk/warcworker | A web archiving tool that archives websites with high-fidelity preservation capabilities. | 55 |
nla/httrack2warc | Converts HTTrack crawls to WARC files by reconstructing requests and responses from logs | 30 |
machawk1/wail | A graphical user interface layer for preserving and replaying web pages using multiple archiving tools. | 350 |
nla/outbackcdx | A RocksDB-based server for managing and replicating capture indexes used in web archiving | 32 |
turicas/crau | A command-line tool for archiving and playing back websites in WARC format | 57 |
jarofghosts/memento-client | Provides a simple JavaScript interface to access historical web pages via the Wayback Machine | 14 |
wabarc/wayback | A tool for capturing and preserving web content and making it accessible in the future. | 1,818 |