fetchbot
Crawler
A flexible web crawler that follows robots.txt policies and crawl delays.
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
787 stars
34 watching
95 forks
Language: Go
last commit: over 3 years ago
Linked from 1 awesome list
crawlerrobots-txt
Related projects:
Repository | Description | Stars |
---|---|---|
puerkitobio/gocrawl | A concurrent web crawler written in Go that allows flexible and polite crawling of websites. | 2,036 |
brendonboshell/supercrawler | A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. | 380 |
pjkelly/robocop | A middleware that adds a meta tag to HTTP responses to instruct search engines on how to crawl the content. | 3 |
hu17889/go_spider | A modular, concurrent web crawler framework written in Go. | 1,827 |
elliotgao2/gain | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,037 |
matteoredaelli/ebot | An Erlang-based web crawler designed to be scalable and highly configurable | 330 |
archiveteam/grab-site | A web crawler designed to backup websites by recursively crawling and writing WARC files. | 1,406 |
fmpwizard/owlcrawler | A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus. | 55 |
fredwu/crawler | A high-performance web crawling and scraping solution with customizable settings and worker pooling. | 945 |
rndinfosecguy/scavenger | An OSINT bot that crawls pastebin sites to search for sensitive data leaks | 634 |
webrecorder/browsertrix-crawler | A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. | 677 |
turnersoftware/infinitycrawler | A web crawling library for .NET that allows customizable crawling and throttling of websites. | 248 |
chenjiandongx/github-spider | A Python-based web crawler for scraping Github user and repository data. | 264 |
a11ywatch/crawler | Performs web page crawling at high performance. | 51 |
c-sto/recursebuster | A tool for recursively querying web servers by sending HTTP requests and analyzing responses to discover hidden content | 243 |