gain

crawler

A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.

Web crawling framework based on asyncio.

GitHub

2k stars
75 watching
207 forks
Language: Python
last commit: over 5 years ago
Linked from 2 awesome lists

aiohttpasynciocrawlerpythonspideruvloop

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
howie6879/ruia An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling 1,752
jmg/crawley A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. 186
chenjiandongx/github-spider A Python-based web crawler for scraping Github user and repository data. 264
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
xianhu/pspider A Python web crawler framework with support for multi-threading and proxy usage. 1,827
puerkitobio/fetchbot A flexible web crawler that follows robots.txt policies and crawl delays. 786
archiveteam/grab-site A web crawler designed to backup websites by recursively crawling and writing WARC files. 1,402
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,826
a11ywatch/crawler Performs web page crawling at high performance. 49
untwisted/sukhoi A minimalist web crawler framework built on top of miners and structure-based data extraction 881
fredwu/crawler A high-performance web crawling and scraping solution with customizable settings and worker pooling. 945
cocrawler/cocrawler A versatile web crawler built with modern tools and concurrency to handle various crawl tasks 187
puerkitobio/gocrawl A concurrent web crawler written in Go that allows flexible and polite crawling of websites. 2,038
rivermont/spidy A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling 340
qinxuye/cola A high-level framework for building distributed data extractors from web pages 1,500