gecco
Web Crawler Framework
A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling.
Easy to use lightweight web crawler(易用的轻量化网络爬虫)
3k stars
144 watching
891 forks
Language: Java
last commit: 9 months ago
Linked from 1 awesome list
crawlerdynamicfastjsongeccojavajsoup
Related projects:
Repository | Description | Stars |
---|---|---|
yujiosaka/headless-chrome-crawler | A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites | 5,527 |
code4craft/webmagic | A scalable framework for building web crawlers in Java. | 11,432 |
zhegexiaohuozi/seimicrawler | An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support | 1,980 |
yasserg/crawler4j | A Java-based web crawler for extracting and processing web page content | 4,555 |
geziyor/geziyor | A fast and flexible web crawling and scraping framework for extracting structured data from websites. | 2,629 |
apache/incubator-stormcrawler | A collection of resources for building web crawlers on Apache Storm using Java | 891 |
mozilla/geckodriver | An HTTP API proxy for interacting with Gecko-based browsers like Firefox | 7,193 |
apify/crawlee | A tool for building reliable web scraping and browser automation pipelines in Node.js. | 15,604 |
stewartmckee/cobweb | A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner | 226 |
unclecode/crawl4ai | A tool for web crawling and data extraction, designed to work with large language models. | 16,180 |
matteoredaelli/ebot | An Erlang-based web crawler designed to be scalable and highly configurable | 330 |
hakluke/hakrawler | A tool for automatically discovering and crawling web application endpoints and assets | 4,502 |
howie6879/ruia | An async web scraping micro-framework built with asyncio and aiohttp to simplify URL crawling | 1,752 |
elliotgao2/gain | A Python web crawling framework utilizing asyncio and aiohttp for efficient data extraction from websites. | 2,035 |
brendonboshell/supercrawler | A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. | 378 |