gecco

Web Crawler Framework

A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling.

Easy to use lightweight web crawler(易用的轻量化网络爬虫)

GitHub

3k stars
144 watching
891 forks
Language: Java
last commit: 9 months ago
Linked from 1 awesome list

crawlerdynamicfastjsongeccojavajsoup

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
yujiosaka/headless-chrome-crawler A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites 5,527
code4craft/webmagic A scalable framework for building web crawlers in Java. 11,432
zhegexiaohuozi/seimicrawler An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support 1,980
yasserg/crawler4j A Java-based web crawler for extracting and processing web page content 4,555
geziyor/geziyor A fast and flexible web crawling and scraping framework for extracting structured data from websites. 2,629
apache/incubator-stormcrawler A collection of resources for building web crawlers on Apache Storm using Java 891
mozilla/geckodriver An HTTP API proxy for interacting with Gecko-based browsers like Firefox 7,193
apify/crawlee A tool for building reliable web scraping and browser automation pipelines in Node.js. 15,604
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226
unclecode/crawl4ai A tool for web crawling and data extraction, designed to work with large language models. 16,180
matteoredaelli/ebot An Erlang-based web crawler designed to be scalable and highly configurable 330
hakluke/hakrawler A tool for automatically discovering and crawling web application endpoints and assets 4,502
howie6879/ruia An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling 1,752
elliotgao2/gain A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. 2,035
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 378