webparsy
Website scraper
A Node.js library and CLI for scraping websites using Puppeteer and YAML definitions
Node.JS library and cli for scraping websites using Puppeteer (or not) and YAML definitions
44 stars
4 watching
7 forks
Language: JavaScript
last commit: almost 2 years ago
Linked from 1 awesome list
browserchromeheadlessnodejspuppeteeryaml
Related projects:
Repository | Description | Stars |
---|---|---|
amoilanen/js-crawler | A Node.js module for crawling web sites and scraping their content | 253 |
zhuyingda/webster | A framework for automating web scraping and crawling tasks using Node.js | 515 |
benibela/xidel | A tool to extract data from web pages using various query languages and selectors. | 681 |
jakopako/goskyr | A tool to simplify web scraping of list-like structured data from web pages | 35 |
tjatse/node-readability | Automates web page scraping and text extraction to make any webpage readable | 343 |
fanyong920/jvppeteer | A Java library for automating Chrome browser functionality using DevTools | 725 |
spider-rs/spider | A web crawler and scraper built on top of Rust, designed to extract data from the web in a flexible and configurable manner. | 1,140 |
felipecsl/wombat | A Ruby-based web crawler and data extraction tool with an elegant DSL. | 1,315 |
spekulatius/phpscraper | A web scraping utility for PHP that simplifies the process of extracting information from websites. | 536 |
miyagawa/web-scraper | A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. | 104 |
jaimeiniesta/metainspector | A Ruby gem for web scraping and extracting metadata from web pages. | 1,036 |
davemolk/gogetjs | Tools for extracting and analyzing JavaScript files from web pages | 40 |
oscarotero/embed | A PHP library to extract metadata and embeddable code from any web page using various protocols and scraping techniques. | 2,091 |
hlaueriksson/puppeteer-sharp-contrib | Extensions to the .NET API for automating Chrome browser tests | 82 |
postmodern/spidr | A Ruby web crawling library that provides flexible and customizable methods to crawl websites | 806 |