scrapy-cluster

Crawler cluster

A distributed scraping framework that scales crawling and prioritizes sites, utilizing Redis and Kafka for coordination.

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

GitHub

1k stars

108 watching

323 forks

Language: Python

last commit: over 2 years ago

Linked from 1 awesome list

distributedkafkapythonredisscrapingscrapy

Screenshot of istresearch/scrapy-cluster website

scrapy-cluster.readthedocs.io/

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
scrapy/scrapely	A pure-python library for extracting structured data from HTML pages.	1,865
dyweb/scrala	A web crawling framework written in Scala that allows users to define the start URL and parse response from it	113
pjkelly/robocop	A middleware that adds a meta tag to HTTP responses to instruct search engines on how to crawl the content.	3
cuiweixie/lua-resty-redis-cluster	A client library for managing Redis clusters using Lua scripts in an OpenResty configuration.	100
efremidze/cluster	A map annotation clustering library that efficiently groups and displays geographic pins on an iOS map view.	1,274
postmodern/spidr	A Ruby web crawling library that provides flexible and customizable methods to crawl websites	809
rndinfosecguy/scavenger	An OSINT bot that crawls pastebin sites to search for sensitive data leaks	634
rusty1s/pytorch_cluster	A PyTorch extension library providing optimized graph cluster algorithms	838
elixir-crawly/crawly	A framework for extracting structured data from websites	994
holgerd77/django-dynamic-scraper	An app that allows you to manage Scrapy spiders through a Django admin interface.	1,155
howie6879/ruia	An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling	1,753
needmorecowbell/giggity	A tool to scrape and store hierarchical data about GitHub organizations, users, or repositories.	127
malfrats/xeuledoc	A tool to fetch information about public Google documents from various services	856
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
tidyverse/rvest	A package for extracting data from web pages using HTML parsing and CSS/XPath selectors.	1,495