SeimiCrawler

Crawler framework

An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support

一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.

GitHub

2k stars
176 watching
682 forks
Language: Java
last commit: over 1 year ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
codesofun/web-bee A Java framework for building web-based crawlers with features like distributed crawling and proxy support. 189
wspl/creeper A framework for building cross-platform web crawlers using Go 780
crawlzone/crawlzone A PHP framework for asynchronous internet crawling and web scraping 77
turnersoftware/infinitycrawler A web crawling library for .NET that allows customizable crawling and throttling of websites. 248
kiddyuchina/beanbun A PHP framework for building distributed web crawlers with modular design and extensibility 1,248
hu17889/go_spider A modular, concurrent web crawler framework written in Go. 1,826
dyweb/scrala A web crawling framework written in Scala that allows users to define the start URL and parse response from it 113
apache/incubator-stormcrawler A collection of resources for building web crawlers on Apache Storm using Java 891
untwisted/sukhoi A minimalist web crawler framework built on top of miners and structure-based data extraction 881
qinxuye/cola A high-level framework for building distributed data extractors from web pages 1,500
jmg/crawley A Pythonic framework for building high-speed web crawlers with flexible data extraction and storage options. 186
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 378
howie6879/ruia An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling 1,752
hominee/dyer A fast and flexible web crawling tool with features like asynchronous I/O and event-driven design. 133
antchfx/antch A framework for building fast and efficient web crawlers and scrapers in Go. 260