gecco

Web Crawler Framework

A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling.

Easy to use lightweight web crawler（易用的轻量化网络爬虫）

GitHub

3k stars

144 watching

891 forks

Language: Java

last commit: over 2 years ago

Linked from 1 awesome list

crawlerdynamicfastjsongeccojavajsoup

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
yujiosaka/headless-chrome-crawler	A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites	5,534
code4craft/webmagic	A framework for building scalable web crawlers in Java.	11,456
zhegexiaohuozi/seimicrawler	A distributed crawler framework that simplifies the process of building crawlers using Spring Boot and Redis	1,980
yasserg/crawler4j	A Java-based web crawler for extracting and processing web page content	4,563
geziyor/geziyor	A fast and flexible web crawling and scraping framework for extracting structured data from websites.	2,646
apache/incubator-stormcrawler	A scalable and versatile web crawling framework based on Apache Storm	895
mozilla/geckodriver	An HTTP API proxy for interacting with Gecko-based browsers like Firefox	7,223
apify/crawlee	A tool for building reliable web scraping and browser automation pipelines in Node.js.	16,081
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
unclecode/crawl4ai	A web crawling tool designed to extract structured data from the web for use in AI applications	18,541
matteoredaelli/ebot	An Erlang-based web crawler designed to be scalable and highly configurable	330
hakluke/hakrawler	A tool for automatically discovering and crawling web application endpoints and assets	4,528
howie6879/ruia	An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling	1,753
elliotgao2/gain	A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites.	2,037
brendonboshell/supercrawler	A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.	380