Github-spider

Crawler

A Python-based web crawler for scraping Github user and repository data.

Github 仓库及用户分析爬虫

GitHub

264 stars
15 watching
91 forks
Language: Python
last commit: over 7 years ago
Linked from 1 awesome list

crawlergithubscrapy

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
qinxuye/cola A high-level framework for building distributed data extractors from web pages 1,500
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,826
chenzixinn/spider_reverse A collection of examples demonstrating reverse engineering of web scraping and API interactions in Python 595
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,035
feng19/spider_man A high-level web crawling and scraping framework for Elixir. 23
xianhu/pspider A Python web crawler framework with support for multi-threading and proxy usage. 1,827
puerkitobio/gocrawl A concurrent web crawler written in Go that allows flexible and polite crawling of websites. 2,038
fredwu/crawler A high-performance web crawling and scraping solution with customizable settings and worker pooling. 945
holgerd77/django-dynamic-scraper An app that allows you to manage Scrapy spiders through a Django admin interface. 1,153
spider-rs/spider A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. 1,140
elixir-crawly/crawly A framework for extracting structured data from websites 987
jmg/crawley A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. 186
3nock/spidersuite A cross-platform web spider/crawler tool for analyzing and mapping attack surfaces 608
rndinfosecguy/scavenger An OSINT bot that crawls pastebin sites to search for sensitive data leaks 629
puerkitobio/fetchbot A flexible web crawler that follows robots.txt policies and crawl delays. 786