sukhoi

Web Crawler Framework

A minimalist web crawler framework built on top of miners and structure-based data extraction

Minimalist and powerful Web Crawler.

GitHub

879 stars
22 watching
49 forks
Language: Python
last commit: about 4 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jmg/crawley A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. 188
zhegexiaohuozi/seimicrawler A distributed crawler framework that simplifies the process of building crawlers using Spring Boot and Redis 1,980
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,037
howie6879/ruia An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling 1,753
codesofun/web-bee A Java framework for building web-based crawlers with features like distributed crawling and proxy support. 189
dyweb/scrala A web crawling framework written in Scala that allows users to define the start URL and parse response from it 113
0x67757300/uhttp A lightweight Pythonic web development framework with modular and flexible application design. 106
joncanning/skyscraper A framework for building asynchronous web scrapers and crawlers using async/await and Reactive Extensions. 59
rivermont/spidy A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling 340
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226
xianhu/pspider A Python web crawler framework with support for multi-threading and proxy usage. 1,828
crawlzone/crawlzone A PHP framework for asynchronous internet crawling and web scraping 78
cocrawler/cocrawler A versatile web crawler built with modern tools and concurrency to handle various crawl tasks 188
toastdriven/itty A lightweight Python web framework with basic features for building small applications. 407
wspl/creeper A framework for building cross-platform web crawlers using Go 780