web-bee

Crawler framework

A Java framework for building web-based crawlers with features like distributed crawling and proxy support.

🐝 Web vertical crawler framework for fun

GitHub

189 stars

23 watching

38 forks

Language: Java

last commit: over 2 years ago

Linked from 1 awesome list

crawlerframeworkjavajava-8webbee

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
zhegexiaohuozi/seimicrawler	A distributed crawler framework that simplifies the process of building crawlers using Spring Boot and Redis	1,980
kiddyuchina/beanbun	A PHP framework for building distributed web crawlers with modular design and extensibility	1,249
hu17889/go_spider	A modular, concurrent web crawler framework written in Go.	1,827
dyweb/scrala	A web crawling framework written in Scala that allows users to define the start URL and parse response from it	113
wspl/creeper	A framework for building cross-platform web crawlers using Go	780
crawlzone/crawlzone	A PHP framework for asynchronous internet crawling and web scraping	78
untwisted/sukhoi	A minimalist web crawler framework built on top of miners and structure-based data extraction	879
brendonboshell/supercrawler	A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.	380
joncanning/skyscraper	A framework for building asynchronous web scrapers and crawlers using async/await and Reactive Extensions.	59
apache/incubator-stormcrawler	A scalable and versatile web crawling framework based on Apache Storm	895
howie6879/ruia	An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling	1,753
jmg/crawley	A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options.	188
xianhu/pspider	A Python web crawler framework with support for multi-threading and proxy usage.	1,828
antchfx/antch	A framework for building fast and efficient web crawlers and scrapers in Go.	261
elliotgao2/gain	A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.	2,037