ArchiveBox

Preservation tool

Automated preservation of internet content in durable formats

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

GitHub

22k stars
174 watching
1k forks
Language: Python
last commit: 5 days ago
Linked from 1 awesome list

archiveboxbackupsbookmark-archiverbrowser-bookmarkschromiumdigipresfirefoxheadless-browserinternet-archivingpinboardpocketpythonrssself-hostedsinglefilewarcwayback-machineweb-archivingwgetyoutube-dl

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
browserbox/browserbox A browser that runs on a remote server and provides isolated access to web content for security, compliance, and other purposes. 3,454
bellingcat/auto-archiver Automates archiving of online content from various sources into local storage or cloud services 570
oduwsdl/archivenow A tool to automate archiving of web resources into public archives. 410
go-shiori/obelisk Archives a web page as a single HTML file with embedded resources. 263
stevepolitodesign/my_site_archive A simple Rails application for archiving websites 27
machawk1/wail A graphical user interface layer for preserving and replaying web pages using multiple archiving tools. 350
wabarc/wayback A tool for capturing and preserving web content and making it accessible in the future. 1,811
tubearchivist/tubearchivist A tool to organize and search archived YouTube videos 5,246
googlechrome/workbox A suite of tools and strategies for efficiently caching and serving web assets 12,366
peterk/warcworker A web archiving tool that archives websites with high-fidelity preservation capabilities. 55
jjjake/internetarchive A command-line and Python interface to access Archive.org's services 1,625
mholt/archiver A multi-format archive utility and Go library that provides a generic replacement for platform-specific or format-specific archive utilities. 4,442
kovah/linkace A tool to collect and manage links to websites and other online resources for long-term archiving. 2,643
derfenix/webarchive A web-based archive service that allows users to store and manage web pages in various formats. 112
apache/pdfbox A Java library for working with PDF documents. 2,675