PSpider
Web Crawler Framework
A Python web crawler framework with support for multi-threading and proxy usage.
简单易用的Python爬虫框架,QQ交流群:597510560
2k stars
114 watching
504 forks
Language: Python
last commit: over 2 years ago
Linked from 1 awesome list
crawlermulti-threadingmultiprocessingproxiespythonpython-spiderspiderweb-crawlerweb-spider
Related projects:
Repository | Description | Stars |
---|---|---|
hightman/pspider | A parallel web crawler framework built using PHP and MySQLi | 266 |
qinxuye/cola | A high-level framework for building distributed data extractors from web pages | 1,500 |
chenjiandongx/github-spider | A Python-based web crawler for scraping Github user and repository data. | 264 |
manning23/mspider | A Python-based tool for web crawling and data collection from various websites | 348 |
elliotgao2/gain | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,035 |
jmg/crawley | A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. | 186 |
howie6879/ruia | An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling | 1,752 |
feng19/spider_man | A high-level web crawling and scraping framework for Elixir. | 23 |
kiddyuchina/beanbun | A PHP framework for building distributed web crawlers with modular design and extensibility | 1,248 |
dyweb/scrala | A web crawling framework written in Scala that allows users to define the start URL and parse response from it | 113 |
hu17889/go_spider | A modular, concurrent web crawler framework written in Go. | 1,826 |
postmodern/spidr | A Ruby web crawling library that provides flexible and customizable methods to crawl websites | 806 |
wspl/creeper | A framework for building cross-platform web crawlers using Go | 780 |
zhegexiaohuozi/seimicrawler | An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support | 1,980 |
rivermont/spidy | A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling | 340 |