headless-chrome-crawler

Crawler

A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites

Distributed crawler powered by Headless Chrome

GitHub

6k stars

116 watching

407 forks

Language: JavaScript

last commit: over 3 years ago

Linked from 1 awesome list

chromechromiumcrawlercrawlingheadless-chromejquerypromisepuppeteerscraperscraping

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
apify/crawlee	A tool for building reliable web scraping and browser automation pipelines in Node.js.	16,081
adieuadieu/serverless-chrome	Provides a scaffold for running headless Chrome in AWS Lambda serverless functions.	2,873
bda-research/node-crawler	A NodeJS-based web crawler and spider that extracts data from websites.	6,718
puppeteer/puppeteer	An API to control Chrome and Firefox browsers programmatically	89,083
brendonboshell/supercrawler	A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.	380
xtuhcy/gecco	A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling.	2,504
spatie/crawler	A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently.	2,552
code4craft/webmagic	A framework for building scalable web crawlers in Java.	11,456
veliovgroup/spiderable-middleware	intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML	39
gocolly/colly	A framework for extracting structured data from websites in a fast and elegant way	23,444
hardkoded/puppeteer-sharp	A .NET API for controlling Headless Chrome instances programmatically	3,450
fanyong920/jvppeteer	A Java library that provides a headless Chrome browser solution for automation and testing purposes.	737
unclecode/crawl4ai	A web crawling tool designed to extract structured data from the web for use in AI applications	18,541
chrome-php/chrome	A PHP library to control and automate Chrome in headless mode	2,307
yasserg/crawler4j	A Java-based web crawler for extracting and processing web page content	4,563