crawler
Crawler
A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently.
An easy to use, powerful crawler implemented in PHP. Can execute Javascript.
3k stars
66 watching
359 forks
Language: PHP
last commit: 4 months ago
Linked from 1 awesome list
concurrencycrawlerguzzlephp
Related projects:
Repository | Description | Stars |
---|---|---|
apify/crawlee | A tool for building reliable web scraping and browser automation pipelines in Node.js. | 15,604 |
yujiosaka/headless-chrome-crawler | A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites | 5,527 |
jae-jae/querylist | A PHP framework for building web scrapers and crawlers with a focus on ease of use and extensibility. | 2,668 |
unclecode/crawl4ai | A tool for web crawling and data extraction, designed to work with large language models. | 16,180 |
spatie/laravel-site-search | A package to create a private search index by crawling and indexing a website | 274 |
code4craft/webmagic | A scalable framework for building web crawlers in Java. | 11,432 |
stewartmckee/cobweb | A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner | 226 |
ruipgil/scraperjs | A versatile web scraping module with two scrapers for static and dynamic content extraction. | 3,710 |
crawlzone/crawlzone | A PHP framework for asynchronous internet crawling and web scraping | 77 |
yasserg/crawler4j | A Java-based web crawler for extracting and processing web page content | 4,555 |
veliovgroup/spiderable-middleware | intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML | 38 |
uscdatascience/sparkler | A high-performance web crawler built on Apache Spark that fetches and analyzes web resources in real-time. | 410 |
spekulatius/phpscraper | A web scraping utility for PHP that simplifies the process of extracting information from websites. | 536 |
brendonboshell/supercrawler | A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. | 378 |
hightman/pspider | A parallel web crawler framework built using PHP and MySQLi | 266 |