headless-chrome-crawler

Crawler

A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites

Distributed crawler powered by Headless Chrome

GitHub

6k stars
115 watching
406 forks
Language: JavaScript
last commit: over 1 year ago
Linked from 1 awesome list

chromechromiumcrawlercrawlingheadless-chromejquerypromisepuppeteerscraperscraping

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
apify/crawlee A tool for building reliable web scraping and browser automation pipelines in Node.js. 15,740
adieuadieu/serverless-chrome Provides a scaffold for running headless Chrome in AWS Lambda serverless functions. 2,868
bda-research/node-crawler A NodeJS-based web crawler and spider that extracts data from websites. 6,704
puppeteer/puppeteer An API to control Chrome and Firefox browsers programmatically 88,902
brendonboshell/supercrawler A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages. 378
xtuhcy/gecco A lightweight web crawler framework that enables easy extraction of web page data using jQuery-like selectors and supports asynchronous requests and distributed crawling. 2,502
spatie/crawler A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently. 2,537
code4craft/webmagic A scalable framework for building web crawlers in Java. 11,437
veliovgroup/spiderable-middleware intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML 38
gocolly/colly A framework for extracting structured data from websites in a fast and elegant way 23,351
hardkoded/puppeteer-sharp A .NET API for controlling Headless Chrome instances programmatically 3,416
fanyong920/jvppeteer A Java library for automating Chrome browser functionality using DevTools 725
unclecode/crawl4ai A tool for web crawling and data extraction, designed to work with large language models. 16,180
chrome-php/chrome An PHP library for controlling headless Chrome instances from PHP 2,283
yasserg/crawler4j A Java-based web crawler for extracting and processing web page content 4,557