node-readability

Web scraper

Automates web page scraping and text extraction to make any webpage readable

Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.

GitHub

343 stars
11 watching
36 forks
Language: JavaScript
last commit: over 6 years ago
Linked from 1 awesome list


Backlinks from these awesome lists:

Related projects:

Repository Description Stars
zhuyingda/webster A framework for automating web scraping and crawling tasks using Node.js 515
joseconstela/webparsy A Node.js library and CLI for scraping websites using Puppeteer and YAML definitions 44
philipjkim/goreadability Extracts readable content from web pages using Open Graph and traditional readability rules. 69
retextjs/retext-readability A plugin to assess text readability using various algorithms 94
plainas/tq Tool that extracts content from HTML documents based on CSS selectors 236
chaijs/loupe Utility function to represent objects as strings in a platform-independent way. 21
jjelosua/doga_scraper A tool that extracts and converts Galician Official journal documents to different formats based on input year. 0
litt1e-p/weapp-girls A Node.js-based web scraping project to extract photos from popular Chinese women's interest websites. 246
felipecsl/wombat A Ruby-based web crawler and data extraction tool with an elegant DSL. 1,315
nodejs/readable-stream Provides a Node.js implementation of the core streams classes for userland development 1,032
amoilanen/js-crawler A Node.js module for crawling web sites and scraping their content 253
miyagawa/web-scraper A Perl toolkit for extracting structured data from HTML documents using a DSL-like interface. 104
gmarty/xgettext Tools for extracting translatable strings from source code written in template languages. 77
disjukr/just-news A userscript project that parses Korean news site and makes the content more readable 191
tj/reds A lightweight search module for Node.js applications using Redis as the backing store. 890