PSpider

Web Crawler Framework

A Python web crawler framework with support for multi-threading and proxy usage.

简单易用的Python爬虫框架,QQ交流群:597510560

GitHub

2k stars
114 watching
503 forks
Language: Python
last commit: over 2 years ago
Linked from 1 awesome list

crawlermulti-threadingmultiprocessingproxiespythonpython-spiderspiderweb-crawlerweb-spider

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
hightman/pspider A parallel web crawler framework built using PHP and MySQLi 266
qinxuye/cola A high-level framework for building distributed data extractors from web pages 1,501
chenjiandongx/github-spider A Python-based web crawler for scraping Github user and repository data. 264
manning23/mspider A Python-based tool for web crawling and data collection from various websites 348
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,037
jmg/crawley A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. 188
howie6879/ruia An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling 1,753
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
kiddyuchina/beanbun A PHP framework for building distributed web crawlers with modular design and extensibility 1,249
dyweb/scrala A web crawling framework written in Scala that allows users to define the start URL and parse response from it 113
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,827
postmodern/spidr A Ruby web crawling library that provides flexible and customizable methods to crawl websites 809
wspl/creeper A framework for building cross-platform web crawlers using Go 780
zhegexiaohuozi/seimicrawler A distributed crawler framework that simplifies the process of building crawlers using Spring Boot and Redis 1,980
rivermont/spidy A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling 340