crawler

Crawler

A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently.

An easy to use, powerful crawler implemented in PHP. Can execute Javascript.

GitHub

3k stars
66 watching
360 forks
Language: PHP
last commit: about 1 month ago
Linked from 1 awesome list

concurrencycrawlerguzzlephp

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apify/crawlee A tool for building reliable web scraping and browser automation pipelines in Node.js. 16,081
yujiosaka/headless-chrome-crawler A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites 5,534
jae-jae/querylist A PHP framework for building web scrapers and crawlers with a focus on ease of use and extensibility. 2,671
unclecode/crawl4ai A web crawling tool designed to extract structured data from the web for use in AI applications 18,541
spatie/laravel-site-search A package to create a private search index by crawling and indexing a website 275
code4craft/webmagic A framework for building scalable web crawlers in Java. 11,456
stewartmckee/cobweb A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner 226
ruipgil/scraperjs A versatile web scraping module with two scrapers for static and dynamic content extraction. 3,714
crawlzone/crawlzone A PHP framework for asynchronous internet crawling and web scraping 78
yasserg/crawler4j A Java-based web crawler for extracting and processing web page content 4,563
veliovgroup/spiderable-middleware intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML 39
uscdatascience/sparkler A high-performance web crawler built on Apache Spark that fetches and analyzes web resources in real-time. 411
spekulatius/phpscraper A web scraping utility for PHP that simplifies the process of extracting information from websites. 544
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 380
hightman/pspider A parallel web crawler framework built using PHP and MySQLi 266