fetchbot

Crawler

A flexible web crawler that follows robots.txt policies and crawl delays.

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

GitHub

786 stars
34 watching
95 forks
Language: Go
last commit: over 3 years ago
Linked from 1 awesome list

crawlerrobots-txt

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
puerkitobio/gocrawl A concurrent web crawler written in Go that allows flexible and polite crawling of websites. 2,038
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 378
pjkelly/robocop A middleware that adds a meta tag to HTTP responses to instruct search engines on how to crawl the content. 3
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,826
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,035
matteoredaelli/ebot An Erlang-based web crawler designed to be scalable and highly configurable 330
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,398
fmpwizard/owlcrawler A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus. 55
fredwu/crawler A high-performance web crawling and scraping solution with customizable settings and worker pooling. 945
rndinfosecguy/scavenger An OSINT bot that crawls pastebin sites to search for sensitive data leaks 629
webrecorder/browsertrix-crawler A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. 652
turnersoftware/infinitycrawler A web crawling library for .NET that allows customizable crawling and throttling of websites. 248
chenjiandongx/github-spider A Python-based web crawler for scraping Github user and repository data. 264
a11ywatch/crawler Performs web page crawling at high performance. 49
c-sto/recursebuster A tool for recursively querying web servers by sending HTTP requests and analyzing responses to discover hidden content 242