goreadability
Web page summary extractor
Extracts readable content from web pages using Open Graph and traditional readability rules.
Webpage summary extractor using Facebook Open Graph and arc90's readability
69 stars
7 watching
8 forks
Language: Go
last commit: over 5 years ago
Linked from 2 awesome lists
opengraphreadabilityscraper
Related projects:
Repository | Description | Stars |
---|---|---|
keepcosmos/readability | An Elixir library that extracts and curates primary readable content from web pages. | 260 |
tjatse/node-readability | Automates web page scraping and text extraction to make any webpage readable | 343 |
philipperemy/stanford-openie-python | Provides a Python interface to extract structured relation triples from plain text using CoreNLP's open information extraction system. | 639 |
jonmagic/grim | A tool for extracting pages from PDFs and converting them to images and text strings. | 216 |
erikriver/opengraph | A Python module to extract and parse metadata from web pages using the Open Graph Protocol. | 230 |
cantino/ruby-readability | A Ruby port of a readability tool that extracts primary content from web pages. | 927 |
foolin/pagser | A tool for automatically extracting structured data from HTML pages | 105 |
neon-jungle/wagtail-readability | Analogizes the readability of text content in Wagtail's RichTextField | 16 |
s0rg/crawley | A utility for systematically extracting URLs from web pages and printing them to the console. | 268 |
peburrows/plot | A GraphQL parser and resolver for Elixir that aims to implement the full GraphQL spec. | 32 |
vrothberg/vgrep | A user-friendly pager for text search and editing | 669 |
itteco/iframely | A service that extracts metadata and embeds from web pages | 1,537 |
steelthread/mimeograph | A CoffeeScript library for extracting text from PDF files and creating searchable documents with OCR capabilities | 28 |
serpapi/nokolexbor | A high-performance HTML5 parser for Ruby based on Lexbor with support for CSS selectors and XPath. | 327 |
plainas/tq | Tool that extracts content from HTML documents based on CSS selectors | 236 |