InfinityCrawler

Crawler library

A web crawling library for .NET that allows customizable crawling and throttling of websites.

A simple but powerful web crawler library for .NET

GitHub

248 stars
11 watching
36 forks
Language: C#
last commit: 11 months ago
Linked from 3 awesome lists

crawlerrobots-txtspiderweb-crawlerweb-crawling

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 378
crypto-crawler/crypto-crawler-rs A Rust-based library for building and managing cryptocurrency crawlers 232
cocrawler/cocrawler A versatile web crawler built with modern tools and concurrency to handle various crawl tasks 187
puerkitobio/gocrawl A concurrent web crawler written in Go that allows flexible and polite crawling of websites. 2,038
fmpwizard/owlcrawler A distributed web crawler that coordinates crawling tasks across multiple worker processes using a message bus. 55
zhegexiaohuozi/seimicrawler An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support 1,980
postmodern/spidr A Ruby web crawling library that provides flexible and customizable methods to crawl websites 806
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,826
fredwu/crawler A high-performance web crawling and scraping solution with customizable settings and worker pooling. 945
webrecorder/browsertrix-crawler A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner. 652
amoilanen/js-crawler A Node.js module for crawling web sites and scraping their content 253
wspl/creeper A framework for building cross-platform web crawlers using Go 780
shapecrawler/shapecrawler A .NET library for creating and manipulating PowerPoint presentations 299
qinxuye/cola A high-level framework for building distributed data extractors from web pages 1,500