supercrawler
Web Crawler
A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
380 stars
11 watching
62 forks
Language: JavaScript
last commit: about 2 years ago
Linked from 1 awesome list
crawlerdistributed-crawlerrobotsitemapweb-crawler
Related projects:
Repository | Description | Stars |
---|---|---|
| A versatile web crawler built with modern tools and concurrency to handle various crawl tasks | 188 |
| A concurrent web crawler written in Go that allows flexible and polite crawling of websites. | 2,036 |
| A web crawling library for .NET that allows customizable crawling and throttling of websites. | 248 |
| A scalable and versatile web crawling framework based on Apache Storm | 895 |
| A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner | 226 |
| A distributed web crawler that fetches and extracts links from websites using a real browser. | 678 |
| A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling | 340 |
| A flexible web crawler that follows robots.txt policies and crawl delays. | 787 |
| A Ruby web crawling library that provides flexible and customizable methods to crawl websites | 809 |
| A modular, concurrent web crawler framework written in Go. | 1,827 |
| A tool for web data extraction and processing using Rust | 1,234 |
| A tool for recursively querying web servers by sending HTTP requests and analyzing responses to discover hidden content | 243 |
| A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus. | 55 |
| A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. | 677 |
| A Scala-based DSL for programmatically accessing and interacting with web pages | 149 |