headless-chrome-crawler

Crawler

A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites

Distributed crawler powered by Headless Chrome

GitHub

6k stars
116 watching
407 forks
Language: JavaScript
last commit: over 1 year ago
Linked from 1 awesome list

chromechromiumcrawlercrawlingheadless-chromejquerypromisepuppeteerscraperscraping

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apify/crawlee A tool for building reliable web scraping and browser automation pipelines in Node.js. 16,081
adieuadieu/serverless-chrome Provides a scaffold for running headless Chrome in AWS Lambda serverless functions. 2,873
bda-research/node-crawler A NodeJS-based web crawler and spider that extracts data from websites. 6,718
puppeteer/puppeteer An API to control Chrome and Firefox browsers programmatically 89,083
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 380
xtuhcy/gecco A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling. 2,504
spatie/crawler A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently. 2,552
code4craft/webmagic A framework for building scalable web crawlers in Java. 11,456
veliovgroup/spiderable-middleware intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML 39
gocolly/colly A framework for extracting structured data from websites in a fast and elegant way 23,444
hardkoded/puppeteer-sharp A .NET API for controlling Headless Chrome instances programmatically 3,450
fanyong920/jvppeteer A Java library that provides a headless Chrome browser solution for automation and testing purposes. 737
unclecode/crawl4ai A web crawling tool designed to extract structured data from the web for use in AI applications 18,541
chrome-php/chrome A PHP library to control and automate Chrome in headless mode 2,307
yasserg/crawler4j A Java-based web crawler for extracting and processing web page content 4,563