Github-spider

Crawler

A Python-based web crawler for scraping Github user and repository data.

Github 仓库及用户分析爬虫

264 stars

15 watching

91 forks

Language: Python

last commit: about 9 years ago

Linked from 1 awesome list

crawlergithubscrapy

Backlinks from these awesome lists:

antbranch/awesome-github

Related projects:

Repository	Description	Stars
qinxuye/cola	A high-level framework for building distributed data extractors from web pages	1,501
hu17889/go_spider	A modular, concurrent web crawler framework written in Go.	1,827
chenzixinn/spider_reverse	A collection of examples demonstrating reverse engineering of web scraping and API interactions in Python	617
elliotgao2/gain	A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.	2,037
feng19/spider_man	A high-level web crawling and scraping framework for Elixir.	23
xianhu/pspider	A Python web crawler framework with support for multi-threading and proxy usage.	1,828
puerkitobio/gocrawl	A concurrent web crawler written in Go that allows flexible and polite crawling of websites.	2,036
fredwu/crawler	A high-performance web crawling and scraping solution with customizable settings and worker pooling.	945
holgerd77/django-dynamic-scraper	An app that allows you to manage Scrapy spiders through a Django admin interface.	1,155
spider-rs/spider	A tool for web data extraction and processing using Rust	1,234
elixir-crawly/crawly	A framework for extracting structured data from websites	994
jmg/crawley	A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options.	188
3nock/spidersuite	A cross-platform web spider/crawler tool for analyzing and mapping attack surfaces	614
rndinfosecguy/scavenger	An OSINT bot that crawls pastebin sites to search for sensitive data leaks	634
puerkitobio/fetchbot	A flexible web crawler that follows robots.txt policies and crawl delays.	787