goreadability

Web page summary extractor

Extracts readable content from web pages using Open Graph and traditional readability rules.

Webpage summary extractor using Facebook Open Graph and arc90's readability

GitHub

69 stars
7 watching
8 forks
Language: Go
last commit: over 5 years ago
Linked from 2 awesome lists

opengraphreadabilityscraper

Backlinks from these awesome lists:

Related projects:

Repository Description Stars
keepcosmos/readability An Elixir library that extracts and curates primary readable content from web pages. 252
tjatse/node-readability Automates web page scraping and text extraction to make any webpage readable 343
philipperemy/stanford-openie-python Provides a Python interface to extract structured relation triples from plain text using CoreNLP's open information extraction system. 636
jonmagic/grim A tool for extracting pages from PDFs and converting them to images and text strings. 216
erikriver/opengraph A Python module to extract and parse metadata from web pages using the Open Graph Protocol. 228
cantino/ruby-readability A tool for extracting readable content from web pages written in Ruby. 925
foolin/pagser A tool for automatically extracting structured data from HTML pages 105
neon-jungle/wagtail-readability Analogizes the readability of text content in Wagtail's RichTextField 16
s0rg/crawley A utility for systematically extracting URLs from web pages and printing them to the console. 263
peburrows/plot A GraphQL parser and resolver for Elixir that aims to implement the full GraphQL spec. 32
vrothberg/vgrep A user-friendly pager for text search and editing 667
itteco/iframely A service that extracts metadata and embeds from web pages 1,528
steelthread/mimeograph A CoffeeScript library for extracting text from PDFs and creating searchable files 28
serpapi/nokolexbor A high-performance HTML5 parser for Ruby based on Lexbor with support for CSS selectors and XPath. 244
plainas/tq Tool that extracts content from HTML documents based on CSS selectors 236