cola

Crawler library

A high-level framework for building distributed data extractors from web pages

A high-level distributed crawling framework.

GitHub

2k stars
166 watching
537 forks
Language: Python
last commit: over 2 years ago
Linked from 2 awesome lists


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
chenjiandongx/github-spider A Python-based web crawler for scraping Github user and repository data. 264
xianhu/pspider A Python web crawler framework with support for multi-threading and proxy usage. 1,827
zhegexiaohuozi/seimicrawler An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support 1,980
crypto-crawler/crypto-crawler-rs A Rust-based library for building and managing cryptocurrency crawlers 232
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
howie6879/ruia An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling 1,752
turnersoftware/infinitycrawler A web crawling library for .NET that allows customizable crawling and throttling of websites. 248
kiddyuchina/beanbun A PHP framework for building distributed web crawlers with modular design and extensibility 1,248
jmg/crawley A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. 186
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,035
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,826
elixir-crawly/crawly A framework for extracting structured data from websites 987
puerkitobio/gocrawl A concurrent web crawler written in Go that allows flexible and polite crawling of websites. 2,038
felipecsl/wombat A Ruby-based web crawler and data extraction tool with an elegant DSL. 1,315
fredwu/crawler A high-performance web crawling and scraping solution with customizable settings and worker pooling. 945