spidy

Web crawler

A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling

The simple, easy to use command line web crawler.

GitHub

340 stars

23 watching

69 forks

Language: Python

last commit: about 1 year ago

Linked from 1 awesome list

crawlercrawlingpythonpython3web-crawlerweb-spider

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
postmodern/spidr	A Ruby web crawling library that provides flexible and customizable methods to crawl websites	809
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
brendonboshell/supercrawler	A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.	380
jmg/crawley	A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options.	188
manning23/mspider	A Python-based tool for web crawling and data collection from various websites	348
twiny/spidy	Tools to crawl websites and collect domain names with availability status	151
spider-rs/spider	A tool for web data extraction and processing using Rust	1,234
webrecorder/browsertrix-crawler	A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner.	677
mvdbos/php-spider	A flexible PHP web crawler with configurable traversal algorithms and filters.	1,336
cocrawler/cocrawler	A versatile web crawler built with modern tools and concurrency to handle various crawl tasks	188
internetarchive/brozzler	A distributed web crawler that fetches and extracts links from websites using a real browser.	678
s0rg/crawley	A utility for systematically extracting URLs from web pages and printing them to the console.	268
joenorton/rubyretriever	A Ruby-based tool for web crawling and data extraction, aiming to be a replacement for paid software in the SEO space.	143
archiveteam/grab-site	A web crawler designed to backup websites by recursively crawling and writing WARC files.	1,406
hominee/dyer	A fast and flexible web crawling tool with features like asynchronous I/O and event-driven design.	135