headless-chrome-crawler
Crawler
A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites
Distributed crawler powered by Headless Chrome
6k stars
115 watching
406 forks
Language: JavaScript
last commit: over 1 year ago
Linked from 1 awesome list
chromechromiumcrawlercrawlingheadless-chromejquerypromisepuppeteerscraperscraping
Related projects:
Repository | Description | Stars |
---|---|---|
apify/crawlee | A tool for building reliable web scraping and browser automation pipelines in Node.js. | 15,740 |
adieuadieu/serverless-chrome | Provides a scaffold for running headless Chrome in AWS Lambda serverless functions. | 2,868 |
bda-research/node-crawler | A NodeJS-based web crawler and spider that extracts data from websites. | 6,704 |
puppeteer/puppeteer | An API to control Chrome and Firefox browsers programmatically | 88,902 |
brendonboshell/supercrawler | A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. | 378 |
xtuhcy/gecco | A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling. | 2,502 |
spatie/crawler | A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently. | 2,537 |
code4craft/webmagic | A scalable framework for building web crawlers in Java. | 11,437 |
veliovgroup/spiderable-middleware | intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML | 38 |
gocolly/colly | A framework for extracting structured data from websites in a fast and elegant way | 23,351 |
hardkoded/puppeteer-sharp | A .NET API for controlling Headless Chrome instances programmatically | 3,416 |
fanyong920/jvppeteer | A Java library for automating Chrome browser functionality using DevTools | 725 |
unclecode/crawl4ai | A tool for web crawling and data extraction, designed to work with large language models. | 16,180 |
chrome-php/chrome | An PHP library for controlling headless Chrome instances from PHP | 2,283 |
yasserg/crawler4j | A Java-based web crawler for extracting and processing web page content | 4,557 |