PSpider

Web Crawler Framework

A Python web crawler framework with support for multi-threading and proxy usage.

简单易用的Python爬虫框架，QQ交流群：597510560

GitHub

2k stars

114 watching

503 forks

Language: Python

last commit: over 3 years ago

Linked from 1 awesome list

crawlermulti-threadingmultiprocessingproxiespythonpython-spiderspiderweb-crawlerweb-spider

github.com/xianhu/PSpider

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
hightman/pspider	A parallel web crawler framework built using PHP and MySQLi	266
qinxuye/cola	A high-level framework for building distributed data extractors from web pages	1,501
chenjiandongx/github-spider	A Python-based web crawler for scraping Github user and repository data.	264
manning23/mspider	A Python-based tool for web crawling and data collection from various websites	348
elliotgao2/gain	A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.	2,037
jmg/crawley	A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options.	188
howie6879/ruia	An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling	1,753
feng19/spider_man	A high-level web crawling and scraping framework for Elixir.	23
kiddyuchina/beanbun	A PHP framework for building distributed web crawlers with modular design and extensibility	1,249
dyweb/scrala	A web crawling framework written in Scala that allows users to define the start URL and parse response from it	113
hu17889/go_spider	A modular, concurrent web crawler framework written in Go.	1,827
postmodern/spidr	A Ruby web crawling library that provides flexible and customizable methods to crawl websites	809
wspl/creeper	A framework for building cross-platform web crawlers using Go	780
zhegexiaohuozi/seimicrawler	A distributed crawler framework that simplifies the process of building crawlers using Spring Boot and Redis	1,980
rivermont/spidy	A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling	340