headless-chrome-crawler
Crawler
A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites
Distributed crawler powered by Headless Chrome
6k stars
116 watching
407 forks
Language: JavaScript
last commit: over 1 year ago
Linked from 1 awesome list
chromechromiumcrawlercrawlingheadless-chromejquerypromisepuppeteerscraperscraping
Related projects:
Repository | Description | Stars |
---|---|---|
apify/crawlee | A tool for building reliable web scraping and browser automation pipelines in Node.js. | 16,081 |
adieuadieu/serverless-chrome | Provides a scaffold for running headless Chrome in AWS Lambda serverless functions. | 2,873 |
bda-research/node-crawler | A NodeJS-based web crawler and spider that extracts data from websites. | 6,718 |
puppeteer/puppeteer | An API to control Chrome and Firefox browsers programmatically | 89,083 |
brendonboshell/supercrawler | A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. | 380 |
xtuhcy/gecco | A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling. | 2,504 |
spatie/crawler | A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently. | 2,552 |
code4craft/webmagic | A framework for building scalable web crawlers in Java. | 11,456 |
veliovgroup/spiderable-middleware | intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML | 39 |
gocolly/colly | A framework for extracting structured data from websites in a fast and elegant way | 23,444 |
hardkoded/puppeteer-sharp | A .NET API for controlling Headless Chrome instances programmatically | 3,450 |
fanyong920/jvppeteer | A Java library that provides a headless Chrome browser solution for automation and testing purposes. | 737 |
unclecode/crawl4ai | A web crawling tool designed to extract structured data from the web for use in AI applications | 18,541 |
chrome-php/chrome | A PHP library to control and automate Chrome in headless mode | 2,307 |
yasserg/crawler4j | A Java-based web crawler for extracting and processing web page content | 4,563 |