crawler

Crawler

A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently.

An easy to use, powerful crawler implemented in PHP. Can execute Javascript.

GitHub

3k stars
66 watching
359 forks
Language: PHP
last commit: 4 months ago
Linked from 1 awesome list

concurrencycrawlerguzzlephp

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apify/crawlee A tool for building reliable web scraping and browser automation pipelines in Node.js. 15,604
yujiosaka/headless-chrome-crawler A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites 5,527
jae-jae/querylist A PHP framework for building web scrapers and crawlers with a focus on ease of use and extensibility. 2,668
unclecode/crawl4ai A tool for web crawling and data extraction, designed to work with large language models. 16,180
spatie/laravel-site-search A package to create a private search index by crawling and indexing a website 274
code4craft/webmagic A scalable framework for building web crawlers in Java. 11,432
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226
ruipgil/scraperjs A versatile web scraping module with two scrapers for static and dynamic content extraction. 3,710
crawlzone/crawlzone A PHP framework for asynchronous internet crawling and web scraping 77
yasserg/crawler4j A Java-based web crawler for extracting and processing web page content 4,555
veliovgroup/spiderable-middleware intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML 38
uscdatascience/sparkler A high-performance web crawler built on Apache Spark that fetches and analyzes web resources in real-time. 410
spekulatius/phpscraper A web scraping utility for PHP that simplifies the process of extracting information from websites. 536
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 378
hightman/pspider A parallel web crawler framework built using PHP and MySQLi 266