spidy
Web crawler
A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling
The simple, easy to use command line web crawler.
340 stars
23 watching
69 forks
Language: Python
last commit: 4 months ago
Linked from 1 awesome list
crawlercrawlingpythonpython3web-crawlerweb-spider
Related projects:
Repository | Description | Stars |
---|---|---|
postmodern/spidr | A Ruby web crawling library that provides flexible and customizable methods to crawl websites | 806 |
stewartmckee/cobweb | A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner | 226 |
brendonboshell/supercrawler | A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. | 378 |
jmg/crawley | A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. | 186 |
manning23/mspider | A Python-based tool for web crawling and data collection from various websites | 348 |
twiny/spidy | Tools to crawl websites and collect domain names with availability status | 149 |
spider-rs/spider | A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. | 1,140 |
webrecorder/browsertrix-crawler | A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. | 652 |
mvdbos/php-spider | A flexible PHP web crawler with configurable traversal algorithms and filters. | 1,332 |
cocrawler/cocrawler | A versatile web crawler built with modern tools and concurrency to handle various crawl tasks | 187 |
internetarchive/brozzler | A distributed web crawler that fetches and extracts links from websites using a real browser. | 671 |
s0rg/crawley | A utility for systematically extracting URLs from web pages and printing them to the console. | 265 |
joenorton/rubyretriever | A Ruby-based tool for web crawling and data extraction, aiming to be a replacement for paid software in the SEO space. | 143 |
archiveteam/grab-site | A web crawler designed to backup websites by recursively crawling and writing WARC files. | 1,402 |
hominee/dyer | A fast and flexible web crawling tool with features like asynchronous I/O and event-driven design. | 133 |