InfinityCrawler

Crawler library

A web crawling library for .NET that allows customizable crawling and throttling of websites.

A simple but powerful web crawler library for .NET

GitHub

248 stars
11 watching
36 forks
Language: C#
last commit: about 1 year ago
Linked from 3 awesome lists

crawlerrobots-txtspiderweb-crawlerweb-crawling

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 380
crypto-crawler/crypto-crawler-rs A Rust-based library for building and managing cryptocurrency crawlers 235
cocrawler/cocrawler A versatile web crawler built with modern tools and concurrency to handle various crawl tasks 188
puerkitobio/gocrawl A concurrent web crawler written in Go that allows flexible and polite crawling of websites. 2,036
fmpwizard/owlcrawler A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus. 55
zhegexiaohuozi/seimicrawler A distributed crawler framework that simplifies the process of building crawlers using Spring Boot and Redis 1,980
postmodern/spidr A Ruby web crawling library that provides flexible and customizable methods to crawl websites 809
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,827
fredwu/crawler A high-performance web crawling and scraping solution with customizable settings and worker pooling. 945
webrecorder/browsertrix-crawler A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. 677
amoilanen/js-crawler A Node.js module for crawling web sites and scraping their content 254
wspl/creeper A framework for building cross-platform web crawlers using Go 780
shapecrawler/shapecrawler A .NET library for creating and manipulating PowerPoint presentations using Open XML. 307
qinxuye/cola A high-level framework for building distributed data extractors from web pages 1,501