crawler4j

Web crawler

A Java-based web crawler for extracting and processing web page content

Open Source Web Crawler for Java

5k stars

306 watching

2k forks

Language: Java

last commit: over 4 years ago

Linked from 3 awesome lists

Backlinks from these awesome lists:

Related projects:

Repository	Description	Stars
code4craft/webmagic	A framework for building scalable web crawlers in Java.	11,456
unclecode/crawl4ai	A web crawling tool designed to extract structured data from the web for use in AI applications	18,541
apify/crawlee	A tool for building reliable web scraping and browser automation pipelines in Node.js.	16,081
spatie/crawler	A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently.	2,552
yujiosaka/headless-chrome-crawler	A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites	5,534
apache/incubator-stormcrawler	A scalable and versatile web crawling framework based on Apache Storm	895
xtuhcy/gecco	A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling.	2,504
hakluke/hakrawler	A tool for automatically discovering and crawling web application endpoints and assets	4,528
brendonboshell/supercrawler	A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.	380
iamstoxe/urlgrab	A tool to crawl websites by exploring links recursively with support for JavaScript rendering.	331
cocrawler/cocrawler	A versatile web crawler built with modern tools and concurrency to handle various crawl tasks	188
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
codesofun/web-bee	A Java framework for building web-based crawlers with features like distributed crawling and proxy support.	189
builderio/gpt-crawler	Automates the process of generating knowledge files to create custom AI models from website content	19,059
twitter4j/twitter4j	A Java library providing access to the Twitter API for sending and retrieving tweets.	2,782