gain

crawler

A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.

Web crawling framework based on asyncio.

GitHub

2k stars

75 watching

208 forks

Language: Python

last commit: about 6 years ago

Linked from 2 awesome lists

aiohttpasynciocrawlerpythonspideruvloop

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
howie6879/ruia	An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling	1,753
jmg/crawley	A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options.	188
chenjiandongx/github-spider	A Python-based web crawler for scraping Github user and repository data.	264
feng19/spider_man	A high-level web crawling and scraping framework for Elixir.	23
xianhu/pspider	A Python web crawler framework with support for multi-threading and proxy usage.	1,828
puerkitobio/fetchbot	A flexible web crawler that follows robots.txt policies and crawl delays.	787
archiveteam/grab-site	A web crawler designed to backup websites by recursively crawling and writing WARC files.	1,406
hu17889/go_spider	A modular, concurrent web crawler framework written in Go.	1,827
a11ywatch/crawler	Performs web page crawling at high performance.	51
untwisted/sukhoi	A minimalist web crawler framework built on top of miners and structure-based data extraction	879
fredwu/crawler	A high-performance web crawling and scraping solution with customizable settings and worker pooling.	945
cocrawler/cocrawler	A versatile web crawler built with modern tools and concurrency to handle various crawl tasks	188
puerkitobio/gocrawl	A concurrent web crawler written in Go that allows flexible and polite crawling of websites.	2,036
rivermont/spidy	A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling	340
qinxuye/cola	A high-level framework for building distributed data extractors from web pages	1,501