webmagic
crawler framework
A scalable framework for building web crawlers in Java.
A scalable web crawler framework for Java.
11k stars
767 watching
4k forks
Language: Java
last commit: 27 days ago
Linked from 4 awesome lists
crawlerframeworkjavascraping
Related projects:
Repository | Description | Stars |
---|---|---|
yasserg/crawler4j | A Java-based web crawler for extracting and processing web page content | 4,555 |
unclecode/crawl4ai | A tool for web crawling and data extraction, designed to work with large language models. | 16,180 |
apify/crawlee | A tool for building reliable web scraping and browser automation pipelines in Node.js. | 15,604 |
yujiosaka/headless-chrome-crawler | A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites | 5,527 |
codesofun/web-bee | A Java framework for building web-based crawlers with features like distributed crawling and proxy support. | 189 |
spatie/crawler | A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently. | 2,537 |
xtuhcy/gecco | A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling. | 2,502 |
stewartmckee/cobweb | A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner | 226 |
zhegexiaohuozi/seimicrawler | An agile and distributed crawler framework designed to simplify and speed up web scraping with Spring Boot support | 1,980 |
sjdirect/abot | A C# web crawler framework built for speed and flexibility, allowing developers to easily crawl websites with customizable logic. | 2,247 |
builderio/gpt-crawler | Automates the process of generating knowledge files to create custom AI models from website content | 18,860 |
brendonboshell/supercrawler | A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. | 378 |
dyweb/scrala | A web crawling framework written in Scala that allows users to define the start URL and parse response from it | 113 |
spine/spine | An MVC framework that provides structure and simplicity for building JavaScript web applications | 3,662 |
hakluke/hakrawler | A tool for automatically discovering and crawling web application endpoints and assets | 4,502 |