webmagic

crawler

A framework for building scalable web crawlers in Java.

A scalable web crawler framework for Java.

11k stars

767 watching

4k forks

Language: Java

last commit: over 1 year ago

Linked from 4 awesome lists

crawlerframeworkjavascraping

Screenshot of code4craft/webmagic website

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
yasserg/crawler4j	A Java-based web crawler for extracting and processing web page content	4,563
unclecode/crawl4ai	A web crawling tool designed to extract structured data from the web for use in AI applications	18,541
apify/crawlee	A tool for building reliable web scraping and browser automation pipelines in Node.js.	16,081
yujiosaka/headless-chrome-crawler	A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites	5,534
codesofun/web-bee	A Java framework for building web-based crawlers with features like distributed crawling and proxy support.	189
spatie/crawler	A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently.	2,552
xtuhcy/gecco	A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling.	2,504
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
zhegexiaohuozi/seimicrawler	A distributed crawler framework that simplifies the process of building crawlers using Spring Boot and Redis	1,980
sjdirect/abot	A C# web crawler framework built for speed and flexibility, allowing developers to easily crawl websites with customizable logic.	2,255
builderio/gpt-crawler	Automates the process of generating knowledge files to create custom AI models from website content	19,059
brendonboshell/supercrawler	A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.	380
dyweb/scrala	A web crawling framework written in Scala that allows users to define the start URL and parse response from it	113
spine/spine	An MVC framework that provides structure and simplicity for building JavaScript web applications	3,665
hakluke/hakrawler	A tool for automatically discovering and crawling web application endpoints and assets	4,528