webparsy

Website scraper

A Node.js library and CLI for scraping websites using Puppeteer and YAML definitions

Node.JS library and cli for scraping websites using Puppeteer (or not) and YAML definitions

GitHub

44 stars
4 watching
7 forks
Language: JavaScript
last commit: almost 2 years ago
Linked from 1 awesome list

browserchromeheadlessnodejspuppeteeryaml

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
amoilanen/js-crawler A Node.js module for crawling web sites and scraping their content 253
zhuyingda/webster A framework for automating web scraping and crawling tasks using Node.js 515
benibela/xidel A tool to extract data from web pages using various query languages and selectors. 681
jakopako/goskyr A tool to simplify web scraping of list-like structured data from web pages 35
tjatse/node-readability Automates web page scraping and text extraction to make any webpage readable 343
fanyong920/jvppeteer A Java library for automating Chrome browser functionality using DevTools 725
spider-rs/spider A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. 1,140
felipecsl/wombat A Ruby-based web crawler and data extraction tool with an elegant DSL. 1,315
spekulatius/phpscraper A web scraping utility for PHP that simplifies the process of extracting information from websites. 536
miyagawa/web-scraper A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. 104
jaimeiniesta/metainspector A Ruby gem for web scraping and extracting metadata from web pages. 1,036
davemolk/gogetjs Tools for extracting and analyzing JavaScript files from web pages 40
oscarotero/embed A PHP library to extract metadata and embeddable code from any web page using various protocols and scraping techniques. 2,091
hlaueriksson/puppeteer-sharp-contrib Extensions to the .NET API for automating Chrome browser tests 82
postmodern/spidr A Ruby web crawling library that provides flexible and customizable methods to crawl websites 806