wpull

Website scraper

Downloads and crawls web pages, allowing for the archiving of websites.

Wget-compatible web downloader and crawler.

GitHub

556 stars
23 watching
77 forks
Language: HTML
last commit: 7 months ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,398
karust/gogetcrawl A tool and package for extracting web archive data from popular sources like Wayback Machine and Common Crawl using the Go programming language. 147
vida-nyu/ache A web crawler designed to efficiently collect and prioritize relevant content from the web 454
felipecsl/wombat A Ruby-based web crawler and data extraction tool with an elegant DSL. 1,315
p3gleg/pwnback Generates a sitemap of a website using Wayback Machine 225
s0rg/crawley A utility for systematically extracting URLs from web pages and printing them to the console. 263
machawk1/wail A graphical user interface layer for preserving and replaying web pages using multiple archiving tools. 350
a11ywatch/crawler Performs web page crawling at high performance. 49
internetarchive/brozzler A distributed web crawler that fetches and extracts links from websites using a real browser. 671
stevepolitodesign/my_site_archive A simple Rails application for archiving websites 27
internetarchive/warctools Tools for working with archived web content 152
bellingcat/auto-archiver Automates archiving of online content from various sources into local storage or cloud services 570
turicas/crau A command-line tool for archiving and playing back websites in WARC format 57
amoilanen/js-crawler A Node.js module for crawling web sites and scraping their content 253
oduwsdl/archivenow A tool to automate archiving of web resources into public archives. 410