wpull
Website scraper
Downloads and crawls web pages, allowing for the archiving of websites.
Wget-compatible web downloader and crawler.
556 stars
23 watching
77 forks
Language: HTML
last commit: 7 months ago
Linked from 1 awesome list
Related projects:
Repository | Description | Stars |
---|---|---|
archiveteam/grab-site | A web crawler designed to backup websites by recursively crawling and writing WARC files. | 1,398 |
karust/gogetcrawl | A tool and package for extracting web archive data from popular sources like Wayback Machine and Common Crawl using the Go programming language. | 147 |
vida-nyu/ache | A web crawler designed to efficiently collect and prioritize relevant content from the web | 454 |
felipecsl/wombat | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
p3gleg/pwnback | Generates a sitemap of a website using Wayback Machine | 225 |
s0rg/crawley | A utility for systematically extracting URLs from web pages and printing them to the console. | 263 |
machawk1/wail | A graphical user interface layer for preserving and replaying web pages using multiple archiving tools. | 350 |
a11ywatch/crawler | Performs web page crawling at high performance. | 49 |
internetarchive/brozzler | A distributed web crawler that fetches and extracts links from websites using a real browser. | 671 |
stevepolitodesign/my_site_archive | A simple Rails application for archiving websites | 27 |
internetarchive/warctools | Tools for working with archived web content | 152 |
bellingcat/auto-archiver | Automates archiving of online content from various sources into local storage or cloud services | 570 |
turicas/crau | A command-line tool for archiving and playing back websites in WARC format | 57 |
amoilanen/js-crawler | A Node.js module for crawling web sites and scraping their content | 253 |
oduwsdl/archivenow | A tool to automate archiving of web resources into public archives. | 410 |