crawley

Crawler

A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options.

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

GitHub

186 stars
22 watching
33 forks
Language: Python
last commit: over 1 year ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,035
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226
rivermont/spidy A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling 340
hominee/dyer A fast and flexible web crawling tool with features like asynchronous I/O and event-driven design. 133
manning23/mspider A Python-based tool for web crawling and data collection from various websites 348
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,826
puerkitobio/gocrawl A concurrent web crawler written in Go that allows flexible and polite crawling of websites. 2,038
untwisted/sukhoi A minimalist web crawler framework built on top of miners and structure-based data extraction 881
internetarchive/brozzler A distributed web crawler that fetches and extracts links from websites using a real browser. 671
cocrawler/cocrawler A versatile web crawler built with modern tools and concurrency to handle various crawl tasks 187
iamstoxe/urlgrab A tool to crawl websites by exploring links recursively with support for JavaScript rendering. 330
xianhu/pspider A Python web crawler framework with support for multi-threading and proxy usage. 1,827
zhegexiaohuozi/seimicrawler An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support 1,980
antchfx/antch A framework for building fast and efficient web crawlers and scrapers in Go. 260
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,398