Github-spider
Crawler
A Python-based web crawler for scraping Github user and repository data.
Github 仓库及用户分析爬虫
264 stars
15 watching
91 forks
Language: Python
last commit: over 7 years ago
Linked from 1 awesome list
crawlergithubscrapy
Related projects:
Repository | Description | Stars |
---|---|---|
qinxuye/cola | A high-level framework for building distributed data extractors from web pages | 1,500 |
hu17889/go_spider | A modular, concurrent web crawler framework written in Go. | 1,826 |
chenzixinn/spider_reverse | A collection of examples demonstrating reverse engineering of web scraping and API interactions in Python | 595 |
elliotgao2/gain | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,035 |
feng19/spider_man | A high-level web crawling and scraping framework for Elixir. | 23 |
xianhu/pspider | A Python web crawler framework with support for multi-threading and proxy usage. | 1,827 |
puerkitobio/gocrawl | A concurrent web crawler written in Go that allows flexible and polite crawling of websites. | 2,038 |
fredwu/crawler | A high-performance web crawling and scraping solution with customizable settings and worker pooling. | 945 |
holgerd77/django-dynamic-scraper | An app that allows you to manage Scrapy spiders through a Django admin interface. | 1,153 |
spider-rs/spider | A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. | 1,140 |
elixir-crawly/crawly | A framework for extracting structured data from websites | 987 |
jmg/crawley | A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. | 186 |
3nock/spidersuite | A cross-platform web spider/crawler tool for analyzing and mapping attack surfaces | 601 |
rndinfosecguy/scavenger | An OSINT bot that crawls pastebin sites to search for sensitive data leaks | 629 |
puerkitobio/fetchbot | A flexible web crawler that follows robots.txt policies and crawl delays. | 786 |