crawler

Crawler

A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently.

An easy to use, powerful crawler implemented in PHP. Can execute Javascript.

GitHub

3k stars

66 watching

360 forks

Language: PHP

last commit: 11 months ago

Linked from 1 awesome list

concurrencycrawlerguzzlephp

freek.dev/308-building-a-crawler-in-php

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
apify/crawlee	A tool for building reliable web scraping and browser automation pipelines in Node.js.	16,081
yujiosaka/headless-chrome-crawler	A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites	5,534
jae-jae/querylist	A PHP framework for building web scrapers and crawlers with a focus on ease of use and extensibility.	2,671
unclecode/crawl4ai	A web crawling tool designed to extract structured data from the web for use in AI applications	18,541
spatie/laravel-site-search	A package to create a private search index by crawling and indexing a website	275
code4craft/webmagic	A framework for building scalable web crawlers in Java.	11,456
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
ruipgil/scraperjs	A versatile web scraping module with two scrapers for static and dynamic content extraction.	3,714
crawlzone/crawlzone	A PHP framework for asynchronous internet crawling and web scraping	78
yasserg/crawler4j	A Java-based web crawler for extracting and processing web page content	4,563
veliovgroup/spiderable-middleware	intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML	39
uscdatascience/sparkler	A high-performance web crawler built on Apache Spark that fetches and analyzes web resources in real-time.	411
spekulatius/phpscraper	A web scraping utility for PHP that simplifies the process of extracting information from websites.	544
brendonboshell/supercrawler	A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.	380
hightman/pspider	A parallel web crawler framework built using PHP and MySQLi	266