webmagic

crawler framework

A scalable framework for building web crawlers in Java.

A scalable web crawler framework for Java.

GitHub

11k stars
767 watching
4k forks
Language: Java
last commit: 27 days ago
Linked from 4 awesome lists

crawlerframeworkjavascraping

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
yasserg/crawler4j A Java-based web crawler for extracting and processing web page content 4,555
unclecode/crawl4ai A tool for web crawling and data extraction, designed to work with large language models. 16,180
apify/crawlee A tool for building reliable web scraping and browser automation pipelines in Node.js. 15,604
yujiosaka/headless-chrome-crawler A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites 5,527
codesofun/web-bee A Java framework for building web-based crawlers with features like distributed crawling and proxy support. 189
spatie/crawler A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently. 2,537
xtuhcy/gecco A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling. 2,502
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226
zhegexiaohuozi/seimicrawler An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support 1,980
sjdirect/abot A C# web crawler framework built for speed and flexibility, allowing developers to easily crawl websites with customizable logic. 2,247
builderio/gpt-crawler Automates the process of generating knowledge files to create custom AI models from website content 18,860
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 378
dyweb/scrala A web crawling framework written in Scala that allows users to define the start URL and parse response from it 113
spine/spine An MVC framework that provides structure and simplicity for building JavaScript web applications 3,662
hakluke/hakrawler A tool for automatically discovering and crawling web application endpoints and assets 4,502