sukhoi

Web Crawler Framework

A minimalist web crawler framework built on top of miners and structure-based data extraction

Minimalist and powerful Web Crawler.

GitHub

881 stars
22 watching
49 forks
Language: Python
last commit: almost 4 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
jmg/crawley A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. 186
zhegexiaohuozi/seimicrawler An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support 1,980
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,035
howie6879/ruia An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling 1,752
codesofun/web-bee A Java framework for building web-based crawlers with features like distributed crawling and proxy support. 189
dyweb/scrala A web crawling framework written in Scala that allows users to define the start URL and parse response from it 113
0x67757300/uhttp A lightweight Pythonic web development framework with modular and flexible application design. 106
joncanning/skyscraper A framework for building asynchronous web scrapers and crawlers using async/await and Reactive Extensions. 58
rivermont/spidy A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling 340
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226
xianhu/pspider A Python web crawler framework with support for multi-threading and proxy usage. 1,827
crawlzone/crawlzone A PHP framework for asynchronous internet crawling and web scraping 77
cocrawler/cocrawler A versatile web crawler built with modern tools and concurrency to handle various crawl tasks 187
toastdriven/itty A lightweight Python web framework with basic features for building small applications. 408
wspl/creeper A framework for building cross-platform web crawlers using Go 780