wpull

Website scraper

Downloads and crawls web pages, allowing for the archiving of websites.

Wget-compatible web downloader and crawler.

556 stars

23 watching

77 forks

Language: HTML

last commit: over 1 year ago

Linked from 1 awesome list

Backlinks from these awesome lists:

iipc/awesome-web-archiving

Related projects:

Repository	Description	Stars
archiveteam/grab-site	A web crawler designed to backup websites by recursively crawling and writing WARC files.	1,406
karust/gogetcrawl	A tool and package for extracting web archive data from popular sources like Wayback Machine and Common Crawl using the Go programming language.	148
vida-nyu/ache	A web crawler designed to efficiently collect and prioritize relevant content from the web	459
felipecsl/wombat	A Ruby-based web crawler and data extraction tool with an elegant DSL.	1,315
p3gleg/pwnback	Generates a sitemap of a website using Wayback Machine	225
s0rg/crawley	A utility for systematically extracting URLs from web pages and printing them to the console.	268
machawk1/wail	A graphical user interface layer for preserving and replaying web pages using multiple archiving tools.	353
a11ywatch/crawler	Performs web page crawling at high performance.	51
internetarchive/brozzler	A distributed web crawler that fetches and extracts links from websites using a real browser.	678
stevepolitodesign/my_site_archive	A simple Rails application for archiving websites	27
internetarchive/warctools	Tools for working with archived web content	153
bellingcat/auto-archiver	Automates archiving of online content from various sources into local storage or cloud services	585
turicas/crau	A command-line tool for archiving and playing back websites in WARC format	59
amoilanen/js-crawler	A Node.js module for crawling web sites and scraping their content	254
oduwsdl/archivenow	A tool to automate archiving of web resources into public archives.	409