js-crawler

Web crawler

A Node.js module for crawling web sites and scraping their content

Web crawler for Node.JS

254 stars

12 watching

55 forks

Language: TypeScript

last commit: about 8 years ago

Linked from 1 awesome list

Backlinks from these awesome lists:

brucedone/awesome-crawler

Related projects:

Repository	Description	Stars
joseconstela/webparsy	A Node.js library and CLI for scraping websites using Puppeteer and YAML definitions	44
stewartmckee/cobweb	A flexible web crawler that can be used to extract data from websites in a scalable and efficient manner	226
vida-nyu/ache	A web crawler designed to efficiently collect and prioritize relevant content from the web	459
webrecorder/browsertrix-crawler	A containerized browser-based crawler system for capturing web content in a high-fidelity and customizable manner.	677
apiel/test-crawler	A tool for end-to-end testing of web applications by crawling and comparing screenshots.	33
brendonboshell/supercrawler	A web crawler designed to crawl websites while obeying robots.txt rules, rate limits and concurrency limits, with customizable content handlers for parsing and processing crawled pages.	380
spider-rs/spider	A tool for web data extraction and processing using Rust	1,234
iamstoxe/urlgrab	A tool to crawl websites by exploring links recursively with support for JavaScript rendering.	331
turnersoftware/infinitycrawler	A web crawling library for .NET that allows customizable crawling and throttling of websites.	248
zhuyingda/webster	A framework for automating web scraping and crawling tasks using Node.js	518
archiveteam/grab-site	A web crawler designed to backup websites by recursively crawling and writing WARC files.	1,406
tjatse/node-readability	Automates web page scraping and text extraction to make any webpage readable	343
archiveteam/wpull	Downloads and crawls web pages, allowing for the archiving of websites.	556
mvdbos/php-spider	A flexible PHP web crawler with configurable traversal algorithms and filters.	1,336
rivermont/spidy	A simple command-line web crawler that automatically extracts links from web pages and can be run in parallel for efficient crawling	340