node-crawler
Web scraper
A NodeJS-based web crawler and spider that extracts data from websites.
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
7k stars
255 watching
875 forks
Language: TypeScript
last commit: 4 months ago
Linked from 2 awesome lists
cheeriocrawlerextract-datajavascriptjquerynodejsspider
Related projects:
Repository | Description | Stars |
---|---|---|
apify/crawlee | A tool for building reliable web scraping and browser automation pipelines in Node.js. | 15,604 |
yujiosaka/headless-chrome-crawler | A distributed crawling framework that leverages Headless Chrome to scrape dynamic websites | 5,527 |
ruipgil/scraperjs | A versatile web scraping module with two scrapers for static and dynamic content extraction. | 3,710 |
rchipka/node-osmosis | A fast and flexible web scraping library using native libxml C bindings | 4,116 |
npm/cli | A package manager for JavaScript that enables users to manage and install dependencies for web applications. | 8,493 |
sindresorhus/got | A powerful HTTP client library for Node.js that provides a human-friendly and flexible way to make requests. | 14,301 |
axios/axios | An HTTP client library for making requests to web servers using the Promise API. | 105,804 |
veliovgroup/spiderable-middleware | intercepts requests from web crawlers and proxies them to a prerendering service for rendering HTML | 38 |
unclecode/crawl4ai | A tool for web crawling and data extraction, designed to work with large language models. | 16,180 |
macbre/phantomas | A tool for collecting and monitoring web performance metrics in a headless Chromium browser environment. | 2,258 |
node-formidable/formidable | A module for parsing multipart form data, especially file uploads in Node.js applications. | 7,055 |
spatie/crawler | A powerful web crawler written in PHP that can execute JavaScript and crawl multiple URLs concurrently. | 2,537 |
code4craft/webmagic | A scalable framework for building web crawlers in Java. | 11,432 |
matthewmueller/x-ray | A flexible web scraping framework for extracting data from websites with customizable selectors and pagination support. | 5,878 |
sjdirect/abot | A C# web crawler framework built for speed and flexibility, allowing developers to easily crawl websites with customizable logic. | 2,247 |