webparsy

Website scraper

A Node.js library and CLI for scraping websites using Puppeteer and YAML definitions

Node.JS library and cli for scraping websites using Puppeteer (or not) and YAML definitions

GitHub

44 stars
4 watching
7 forks
Language: JavaScript
last commit: about 2 years ago
Linked from 1 awesome list

browserchromeheadlessnodejspuppeteeryaml

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
amoilanen/js-crawler A Node.js module for crawling web sites and scraping their content 254
zhuyingda/webster A framework for automating web scraping and crawling tasks using Node.js 518
benibela/xidel A tool to extract data from web pages using various query languages and selectors. 690
jakopako/goskyr A tool to simplify web scraping of list-like structured data from web pages 36
tjatse/node-readability Automates web page scraping and text extraction to make any webpage readable 343
fanyong920/jvppeteer A Java library that provides a headless Chrome browser solution for automation and testing purposes. 737
spider-rs/spider A tool for web data extraction and processing using Rust 1,234
felipecsl/wombat A Ruby-based web crawler and data extraction tool with an elegant DSL. 1,315
spekulatius/phpscraper A web scraping utility for PHP that simplifies the process of extracting information from websites. 544
miyagawa/web-scraper A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. 104
jaimeiniesta/metainspector A Ruby gem for web scraping and extracting metadata from web pages. 1,038
davemolk/gogetjs Tools for extracting and analyzing JavaScript files from web pages 41
oscarotero/embed A PHP library to retrieve metadata and embed code from any web page 2,100
hlaueriksson/puppeteer-sharp-contrib Extensions to the .NET API for automating Chrome browser tests 82
postmodern/spidr A Ruby web crawling library that provides flexible and customizable methods to crawl websites 809