cola

Crawler library

A high-level framework for building distributed data extractors from web pages

A high-level distributed crawling framework.

2k stars

166 watching

537 forks

Language: Python

last commit: almost 4 years ago

Linked from 2 awesome lists

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
chenjiandongx/github-spider	A Python-based web crawler for scraping Github user and repository data.	264
xianhu/pspider	A Python web crawler framework with support for multi-threading and proxy usage.	1,828
zhegexiaohuozi/seimicrawler	A distributed crawler framework that simplifies the process of building crawlers using Spring Boot and Redis	1,980
crypto-crawler/crypto-crawler-rs	A Rust-based library for building and managing cryptocurrency crawlers	235
feng19/spider_man	A high-level web crawling and scraping framework for Elixir.	23
howie6879/ruia	An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling	1,753
turnersoftware/infinitycrawler	A web crawling library for .NET that allows customizable crawling and throttling of websites.	248
kiddyuchina/beanbun	A PHP framework for building distributed web crawlers with modular design and extensibility	1,249
jmg/crawley	A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options.	188
elliotgao2/gain	A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.	2,037
hu17889/go_spider	A modular, concurrent web crawler framework written in Go.	1,827
elixir-crawly/crawly	A framework for extracting structured data from websites	994
puerkitobio/gocrawl	A concurrent web crawler written in Go that allows flexible and polite crawling of websites.	2,036
felipecsl/wombat	A Ruby-based web crawler and data extraction tool with an elegant DSL.	1,315
fredwu/crawler	A high-performance web crawling and scraping solution with customizable settings and worker pooling.	945