webarchive-discovery
Web archive indexer
Tools for indexing and discovering archived web content
WARC and ARC indexing and discovery tools.
117 stars
24 watching
25 forks
Language: Java
last commit: 7 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
| A web archive exploration UI built on top of the Solr search engine and warc-discovery indexer. | 43 |
| Provides tools for reading and parsing web archive formats used in digital preservation. | 20 |
| Tools for bulk indexing of WARC/ARC files to create a shared url index | 43 |
| Tools for working with archived web content | 153 |
| A toolkit for analyzing and extracting data from legacy web archives in a structured format suitable for further analysis or reuse | 3 |
| A search interface and archival tool for browsing historical web pages | 102 |
| A containerized web archive and search system using Elastic Search | 27 |
| Tool for partitioning and merging Web archive files by MIME type and year | 1 |
| A web archiving tool that archives websites with high-fidelity preservation capabilities. | 57 |
| Converts HTTrack crawls to WARC files by reconstructing requests and responses from logs | 32 |
| A graphical user interface layer for preserving and replaying web pages using multiple archiving tools. | 353 |
| A RocksDB-based server for managing and replicating capture indexes used in web archiving | 33 |
| A command-line tool for archiving and playing back websites in WARC format | 59 |
| Provides a simple JavaScript interface to access historical web pages via the Wayback Machine | 14 |
| A tool for capturing and preserving web content and making it accessible in the future. | 1,839 |