fetchbot

Crawler

A flexible web crawler that follows robots.txt policies and crawl delays.

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

GitHub

787 stars

34 watching

95 forks

Language: Go

last commit: about 5 years ago

Linked from 1 awesome list

crawlerrobots-txt

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
puerkitobio/gocrawl	A concurrent web crawler written in Go that allows flexible and polite crawling of websites.	2,036
brendonboshell/supercrawler	A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.	380
pjkelly/robocop	A middleware that adds a meta tag to HTTP responses to instruct search engines on how to crawl the content.	3
hu17889/go_spider	A modular, concurrent web crawler framework written in Go.	1,827
elliotgao2/gain	A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.	2,037
matteoredaelli/ebot	An Erlang-based web crawler designed to be scalable and highly configurable	330
archiveteam/grab-site	A web crawler designed to backup websites by recursively crawling and writing WARC files.	1,406
fmpwizard/owlcrawler	A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus.	55
fredwu/crawler	A high-performance web crawling and scraping solution with customizable settings and worker pooling.	945
rndinfosecguy/scavenger	An OSINT bot that crawls pastebin sites to search for sensitive data leaks	634
webrecorder/browsertrix-crawler	A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner.	677
turnersoftware/infinitycrawler	A web crawling library for .NET that allows customizable crawling and throttling of websites.	248
chenjiandongx/github-spider	A Python-based web crawler for scraping Github user and repository data.	264
a11ywatch/crawler	Performs web page crawling at high performance.	51
c-sto/recursebuster	A tool for recursively querying web servers by sending HTTP requests and analyzing responses to discover hidden content	243