sukhoi

Web Crawler Framework

A minimalist web crawler framework built on top of miners and structure-based data extraction

Minimalist and powerful Web Crawler.

879 stars

22 watching

49 forks

Language: Python

last commit: over 5 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
jmg/crawley	A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options.	188
zhegexiaohuozi/seimicrawler	A distributed crawler framework that simplifies the process of building crawlers using Spring Boot and Redis	1,980
elliotgao2/gain	A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.	2,037
howie6879/ruia	An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling	1,753
codesofun/web-bee	A Java framework for building web-based crawlers with features like distributed crawling and proxy support.	189
dyweb/scrala	A web crawling framework written in Scala that allows users to define the start URL and parse response from it	113
0x67757300/uhttp	A lightweight Pythonic web development framework with modular and flexible application design.	106
joncanning/skyscraper	A framework for building asynchronous web scrapers and crawlers using async/await and Reactive Extensions.	59
rivermont/spidy	A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling	340
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
xianhu/pspider	A Python web crawler framework with support for multi-threading and proxy usage.	1,828
crawlzone/crawlzone	A PHP framework for asynchronous internet crawling and web scraping	78
cocrawler/cocrawler	A versatile web crawler built with modern tools and concurrency to handle various crawl tasks	188
toastdriven/itty	A lightweight Python web framework with basic features for building small applications.	407
wspl/creeper	A framework for building cross-platform web crawlers using Go	780